require(RLDNe)
require(car)
require(DHARMa)
require(emmeans)
require(MASS)
require(effects)
require(glmmTMB)
require(lme4)
require(kableExtra)
require(gt)
require(gtsummary)
require(tidyverse)
require(magrittr)
require(countreg)
require(lmerTest)
require(lubridate)
require(khroma)

Summary

This notebook contains a log of all analyses for the 2022 South Fork McKenzie River spring Chinook salmon genetic pedigree study. Inference of the pedigree used here is conducted in a separate notebook titled “parentage_notebook” in this same repository.

This is an R notebook. The .html version of this file is a fully rendered and interactive log. To view it, save the html and open in a browse. The .rmd version can be opened within R studio. To reproduce results or edit the analysis: clone the full repository onto your local machine and open the r project in rstudio. This will provide all needed data and objects.

Pedigrees and Cohort Rationale

This notebook relies on a pedigree of all Chinook Salmon released above Cougar Dam on the South Fork McKenzie River from 2007 - 2017 using potential offspring sampled from 2010 to 2020.

Summary of Cohort Years

Previous reports and manuscripts evaluating the reintroduction of Chinook salmon above Cougar Dam on the South Fork McKenzie river have considered NOR salmon sampled from 2010 to 2015 as potential offspring of salmon released above Cougar Dam from 2007 - 2012.

Most Chinook salmon on the South Fork McKenzie express an age at maturity of 3 - 6 years. Therefore, previous reports (relying on 2010 - 2015 NOR returns) have inferred a pedigree for salmon released above the dam from 2007 - 2009. Results based on the pedigree of salmon released above Cougar Dam in 2010 were also provided along with the caveat that age 6 offspring were not yet evaluated and some results such as Total Lifetime Fitness and Cohort Replacement Rate were likely underestimates.

Continuing this work, we have since genotyped NOR salmon sampled on the South Fork McKenzie from 2016 to 2020, as well as salmon released above Cougar dam from 2013 to 2017. These new data allow us to complete the pedigree of salmon released above Cougar Dam in 2010, infer full pedigrees TLF for salmon released above Cougar Dam from 2011 - 2014, and partial pedigrees for salmon released above the dam from 2015 - 2017.

Inferring Pedigrees Again

Irrevocable updates to software packages used to assign parentage (COLONY) prevent us from exactly reproducing the approach used to infer the pedigrees used in previous reports. We chose to infer all pedigrees from parent years 2007 to 2017 (offspring year 2010 - 2020) from the raw genetic data using a consistent approach rather than simply combining previous pedigrees with those inferred from new data.

This means that parent year results from 2007-2010 and offspring year results from 2010 - 2015, that have previously been reported on may change. In the parentage assignment notebook we evaluated how much changes to the parentage assignment approach affected assignment rate, cohort replacement rate. The overall assignment rate differed an average of 0.7% across the six years where pedigrees were previously inferred.

This approach has several advantages:
(1) Trends - Results based on the pedigrees such as Cohort Replacement Rate and Total Lifetime Fitness suffer from the same biases across all years, allowing more confidence in the identification of year - year trends.
(2) Fitness modeling - Similar to above, applying a consistent approach to pedigree inference gives us more power to identify predictors of fitness by allowing us to combine data from more years into a single analysis
(3) Grandparentage and Great-Grandparentage: If a unified filtering and pedigree inference approach is used for the entire dataset, we can more confidently combine pedigrees inferred for a single offspring year with one another, allowing for a 3 (or perhaps even 4) generation pedigree. This means we can identify gradnparentage and great grandparentage, opening up the potential to address many important questions such as fitness effects of hatchery selection.
(4) Minor Issues - In reproducing previous results, minor errors in code were identified. While these errors ultimately are expected to have little effect on the final pedigrees and results, inferring the pedigrees again allows us to fix these small errors and evaluate their effects. For example in the Cervus pedigrees, the software is run using a single parent year and a single offspring year, then the results are concatenated for all parent years within an offspring year. This renders the likelihoods incomparable. Likelihoods are used to break ties when inferring the cervus pedigree, but it likely never had a large effect because Colony is given priority in the consensus pedigree.

Throughout the notebook we present the results for all parent years using the newly inferred pedigrees.

Goals

  1. Summarise dataset and release history
  2. Summarise assignment of offspring to parents, use to infer age structure.
  3. Estimate total lifetime fitness (TLF) for parent cohorts
  4. Estimate cohort replacement rates for parent cohorts
  5. Assess variables that influence fitness with general linear models
  6. Estimate effective number of breeders using NeEstimator
  7. Compare HOR and NOR fitness

This is an R notebook. The .html version of this file is a fully rendered and interactive log. To view it, save the html and open in a browse. The .rmd version can be opened within R studio. To reproduce results or edit the analysis: clone the full repository onto tyour local machine and open the r project in rstudio. This will provide all needed data and objects.

Data

Genotype and Metadata
A log of the work to consolidate sample metadata can be found in the cougar_trap_metadata_mgmt and the meta_data_consolidation R notebooks in the metadata directory of this repository. There are also readmes in that directory that explain subdirectory structure and files.

The log (R notebook) for genotype data prep is titled genotype_data_prep and can be found in the genotypes directory of this repository.

Here, we import the final outputs of this notebook containing both sample metadata and genotype which include (by R environment name):

  1. full_unfilt: All genotypes and metadata, no genotype quality filtering. Includes all individuals known to be potential parents or offspring for the project.
  2. full_data_1.0: Genotypes and metadata, after removing individuals with fewer than 7 scored genotypes and removing duplicates.
load("../genotypes/genotype_data/full_unfiltered_dataset.R")
load("../genotypes/genotype_data/full_filtered_dataset.R")

Both files also contain some individuals twice (for example: Cougar trap LSDR samples have two rows, one for each observation at the Cougar Trap). So, let’s also produce some objects that only include the final observations for individuals.

  1. dedup_unfilt: same as full_unfilt above, but only last observation of an individual retained.
  2. dedup: same as full_data_1.0 above, but only second observation of an individual retained.
dedup <- full_data_1.0 %>%
  group_by(sample_id) %>%
  slice_max(date, with_ties = FALSE) %>%
  ungroup()

dedup_unfilt <- full_unfilt %>%
  group_by(sample_id) %>%
  slice_max(date, with_ties = FALSE) %>%
  ungroup()

Pedigree

The log for inferring final consensus pedigree can be found in the parentage_notebook in the parentage directory of this repository. It includes parentages for all potential offspring sampled from 2010 - 2020.

Here we import the final consensus pedigree. Note that this pedigree only has rows for assigned parentages, offspring with no parentage assigned do not appear in this object.

load("../parentage/pedigree.R")

Dataset and Release Summary Counts

Let’s summarise the complete dataset, by source and origin, without respect to where the fish wound up.

kable(dedup %>% count(year, type, origin, section) %>% 
  pivot_wider(id_cols = year, names_from = c(type, origin, section), values_from = n, names_sep = "\n"), caption = "Table N0: sample sizes by source", align = "c") %>%
  kable_classic(full_width = T, html_font = "Arial" )
Table N0: sample sizes by source
year hatchery_outplant HOR NA cougar_trap HOR NA cougar_trap NOR NA sgs NOR SFMK_below cougar_trap NA NA precocial_male NOR NA sgs NOR SFMK_above
2007 745 NA NA NA NA NA NA
2008 873 NA NA NA NA NA NA
2009 1383 NA NA NA NA NA NA
2010 496 30 221 NA NA NA NA
2011 340 29 356 45 NA NA NA
2012 430 17 500 10 NA NA NA
2013 440 22 223 5 NA NA NA
2014 486 21 191 14 1 12 1
2015 600 19 241 24 NA NA NA
2016 459 74 295 31 4 NA 4
2017 448 6 239 13 1 NA NA
2018 NA NA 120 35 NA NA NA
2019 NA NA 158 15 NA NA NA
2020 NA NA 162 NA NA NA NA

Table N0: Sample sizes after filtering by year, origin and source. Source column names refer to three values, separated by spaces: (1) where was the sample originally encountered (hatchery, cougar trap, during a spawning ground survey, or as a precocial male), (2) origin, and (3) for SGS samples only, was the SGS conducted above or below Cougar Dam

Parents

Counts and Sex Ratios

Throughout this report (and previous reports), it has been useful to split the dataset up and present results by offspring and candidate parents. Let’s keep with this approach and present the numbers of potential parents (table N1) and the numbers of potential offspring table N7 .

Here we only present the numbers for final filtered dataset for two reasons:

  1. In later analyses where we calculate demographic parameters such as CRR, we define the number of candidate parents as the number of potential parents passed to Colony or Cervus, to which we assign potential offspring. Keeping this consistent will make these tables less confusing down the road.
  2. Duplicates: There are many individuals filtered from the raw dataset because they have identical genotypes to another sample. This is potentially due to tissue samples being stored in a shared container (one fin clip gets split into multiple pieces), or spawning ground surveys sampling the same individual that was already sampled at the Cougar Trap. Including these individuals in the number of parents is misleading.
#unfilt_parent_summary <- dedup_unfilt %>%
#  filter(cand_parent == TRUE) %>%
#  count(year, type, origin) %>%
#  rename(n_unfiltered = n)

# let's add a column for sex ratio

filt_parent_summary <- dedup %>%
  filter(cand_parent == TRUE) %>%
  count(year, type, origin) %>%
  rename(n_final = n)

filt_parent_summary %<>%
  unite("type_origin", type, origin) %>%
  pivot_wider(id_cols = year, names_from = type_origin, values_from = n_final)

options(knitr.kable.NA = '')
kable(filt_parent_summary, caption = "Table N1: Number of Candidate Parents") %>%
  kable_classic(full_width = T, html_font = "Arial" )
Table N1: Number of Candidate Parents
year hatchery_outplant_HOR cougar_trap_HOR cougar_trap_NOR cougar_trap_NA precocial_male_NOR sgs_NOR
2007 745
2008 873
2009 1383
2010 496 30 221
2011 340 29 356
2012 430 17 500
2013 440 15 172
2014 486 20 132 1 12 1
2015 600 19 135
2016 459 70 171 1 4
2017 448 5 151 1
2018 68
2019 65
2020 89
# there are two 

Table N1: Number of individuals relased above Cougar dam retained in the final filtered dataset. These values correspond to candidate parents used in the assignments. “Hatchery Outplant HOR” refers to individuals trapped at either McKenzie or Leaburg hatchery and released above the dam, all are HOR. Cougar Trap includes both NOR, HOR and unknown origin (NA) individuals released above the dam. Precocial males refers to a small set of jacks sampled on the spawning grounds above the dam in 2014. SGS refers to additional individuals sampled during spawning ground surveys above the dam.

Let’s also split up the parent cohorts by sex. Here we split into different tables to make them more manageable.

kable(dedup %>%
  filter( cand_parent == TRUE, year < 2018) %>%
  count(year, sex) %>%
  pivot_wider(id_cols = year, names_from = sex, values_from = n) %>%
  mutate(sex_ratio = M/`F`), caption = "Table N2: Sex Ratios of all Candidate Parents", align = "c", digits = 2 ) %>%
  kable_classic(full_width = F, html_font = "Arial")
Table N2: Sex Ratios of all Candidate Parents
year F M sex_ratio
2007 318 427 1.34
2008 288 585 2.03
2009 603 780 1.29
2010 263 484 1.84
2011 320 405 1.27
2012 439 508 1.16
2013 325 302 0.93
2014 386 266 0.69
2015 465 289 0.62
2016 403 302 0.75
2017 370 235 0.64

Table N2: Number of candidate parents females (F) and males (M) (individual released above Cougar Dam and retained in the final dataset). Here we split by source, individuals are either from Cougar Trap (NOR and HOR) or from the hatchery (HOR). Sex ratio is presented as n_male/n_female.

kable(dedup %>%
  filter(type == "hatchery_outplant", cand_parent == TRUE) %>%
  count(year, sex) %>%
  pivot_wider(id_cols = year, names_from = sex, values_from = n) %>%
  mutate(sex_ratio = M/`F`), caption = "Table N3: Sex Ratios of Candidate Parents from Hatchery", align = "c", digits = 2) %>%
  kable_classic(full_width = F, html_font = "Arial"  )
Table N3: Sex Ratios of Candidate Parents from Hatchery
year F M sex_ratio
2007 318 427 1.34
2008 288 585 2.03
2009 603 780 1.29
2010 201 295 1.47
2011 170 170 1.00
2012 248 182 0.73
2013 239 201 0.84
2014 327 159 0.49
2015 412 188 0.46
2016 315 144 0.46
2017 325 123 0.38
kable(dedup %>%
  filter(type == "cougar_trap", cand_parent == TRUE, year < 2018) %>%
  count(year, sex) %>%
  pivot_wider(id_cols = year, names_from = sex, values_from = n) %>%
  mutate(sex_ratio = M/`F`), caption = "Table N4: Sex Ratios of Candidate Parents from Cougar Trap", align = "c", digits = 2) %>%
  kable_classic(full_width = F, html_font = "Arial"  )
Table N4: Sex Ratios of Candidate Parents from Cougar Trap
year F M sex_ratio
2010 62 189 3.05
2011 150 235 1.57
2012 191 326 1.71
2013 86 101 1.17
2014 59 94 1.59
2015 53 101 1.91
2016 86 156 1.81
2017 45 112 2.49

Tables N3 - N4: Number of candidate parents females (F) and males (M) (individual released above Cougar Dam and retained in the final dataset). Here we split by source, individuals are either from Cougar Trap (NOR and HOR) or from the hatchery (HOR). Sex ratio is presented as n_male/n_female.

Cougar Trap LSDR / Recycling

Finally, let’s summarise the approaches used for releasing individuals from the Cougar Trap.

Some terms to define:
(1) Late-season downstream release (LSDR): Individuals that arrive at the Cougar Trap on and after September 1st are double floy tagged and released downstream at Forest Glen. If they return to the trap a second time, they are released above the dam.
(2) Recycling: All individuals at the Cougar Trap, regardless of date, are first released downstream of the dam (either in Forest Glen, or just the dam tailrace). If they return a secodn time they are released above the dam.

LDSR / Recycling Summary During 2010, 2011 and 2012, neither LSDR or recycling was used. All fish that were trapped and survived were released above the dam.
During 2013 and 2014, the LSDR approach was applied to NOR fish after September 1st. Some HORs on and after September 1st were also LSDR but not all.
From 2015 onwards all NOR are recycled. Some HOR are also recycled, but most are not.

Let’s collect some numbers here.

kable(full_data_1.0 %>%
  filter(type == "cougar_trap", origin == "NOR", year %in% c(2013, 2014)) %>%
    filter(date %within% interval(ymd("2013-09-01"), ymd("2013-12-01")) |  date %within% interval(ymd("2014-09-01"), ymd("2014-12-01"))) %>%
  count(year, recapture) %>%
  pivot_wider(id_cols = year, names_from = recapture, values_from = n) %>%
  mutate("percent recaptured" = (RE/NEW)*100) %>%
  rename("n new" = NEW, "n recaptured" = RE)  , digits = 1, caption = "Table N5: LSDR Program Summary", align = "c") %>%
  kable_classic(full_width = F, html_font = "Arial" )
Table N5: LSDR Program Summary
year n new n recaptured percent recaptured
2013 64 15 23.4
2014 75 16 21.3

Table N5: LSDR summary: Number of NOR fish sampled at Cougar trap on or after September 1st (n new), and the number of these fish that returned a second time.
note: Banks 2016 tech report states that 74 (not 75 as presented here) NOR fish returned on or after September 1st in 2014. I went back to Nick’s data to make sure I didn’t mess something up. His data also has 75, not 74 individuals. The discrepancy stems from a single individual in 2014 with only 7 loci scored. This individual is in Nick’s raw data, but not in the pedigree file used to calculate these numbers in the report, due to missingness. Here we apply a different filtering approach (details in genotype data prep), so the individual is retained.

kable(full_data_1.0 %>%
  filter(type == "cougar_trap", origin == "NOR", year > 2014) %>%
    group_by(sample_id) %>% # a couple individuals are recorded three times, this will break these counts, just get the first two instances
    slice_max(date, n = 2, with_ties = FALSE) %>%
    ungroup() %>%
  count(year, recapture) %>%
  pivot_wider(id_cols = year, names_from = recapture, values_from = n) %>%
  select(-`NA`) %>% # one NA individual with no metadata
  mutate("percent recaptured" = (RE/NEW)*100) %>%
  rename("n new" = NEW, "n recaptured" = RE)  , digits = 1, caption = "Table N6: Recycling Program Summary", align = "c") %>%
  kable_classic(full_width = F, html_font = "Arial" )
Table N6: Recycling Program Summary
year n new n recaptured percent recaptured
2015 239 136 56.9
2016 294 172 58.5
2017 239 151 63.2
2018 120 73 60.8
2019 157 67 42.7
2020 161 90 55.9

Table N6: Recycling Summary: Number of NOR fish sampled at Cougar trap, and the number of these fish that returned a second time.

Offspring

Now let’s summarise the potential offspring. Once again we present the counts from the final filtered dataset, because these are the values we use to calculate demographic results such as CRR, and there may be duplicates in the unfiltered dataset due to batch sampling or carcass sampling.

kable(dedup %>%
  filter(origin == "NOR") %>%
  count(year, type) %>%
  pivot_wider(id_cols = year, names_from = type, values_from = n), caption = "Table N7: Potential Offspring Counts") %>%
  kable_classic(full_width = F, html_font = "Arial" )
Table N7: Potential Offspring Counts
year cougar_trap sgs precocial_male
2010 221
2011 356 45
2012 500 10
2013 223 5
2014 191 15 12
2015 241 24
2016 295 35
2017 239 13
2018 120 35
2019 158 15
2020 162

Table N7: Number of potential offspring (NOR individuals after filtering), sampled at the Cougar Trap, during spawning ground surveys, or as precocial males observed on spawning grounds above the dam. Note that spawning ground survey counts include individuals from surveys above the dam as well (1 in 2014 and 4 in 2016)

Assignments

Here we summarise assignment of offspring to parents, infer age structure and evaluate the efficacy of LSDR and recycling programs at excluding NOR immigrants from above the dam.

For summarizing assignments we’ll start simply, then get more complex.

Assignments, by Offpsring Year

We’ll start simply. How many offspring in a given year assign to parents above the dam

# let's get the metadata on the pedigree
pedigree_meta <- dedup %>%
  select(sample_id, year, type, date) %>%
  rename_with(.fn = ~ paste0("offspring_", .x)) %>%
  right_join(pedigree, by = c("offspring_sample_id" = "offspring_sample_id"))

#father
pedigree_meta <- dedup %>%
  select(sample_id, year, type, date) %>%
  rename_with(.fn = ~ paste0("father_", .x)) %>%
  right_join(pedigree_meta, by = c("father_sample_id" = "father")) %>%
  rename(father = father_sample_id)

#mother
pedigree_meta <- dedup %>%
  select(sample_id, year, type, date) %>%
  rename_with(.fn = ~ paste0("mother_", .x)) %>%
  right_join(pedigree_meta, by = c("mother_sample_id" = "mother")) %>%
  rename(mother = mother_sample_id)

pedigree_meta %<>%
  mutate(parent_year = (coalesce(father_year, mother_year)))
kable(pedigree_meta %>%
  group_by(offspring_year) %>%
  summarise(n = n(), assigned_n = n() - sum(mother == "none" & father == "none", na.rm = TRUE), assn_rate = (n()-sum(mother == "none" & father == "none", na.rm = TRUE))/n()) %>%
  mutate(assignment_percentage = assn_rate*100) %>%
  select(offspring_year,n_offspring = n, n_assigned = assigned_n, assignment_percentage), digits = 1, caption = "Table N8: Assignment rates per year") %>%
kable_classic(full_width = F, html_font = "Arial" )
Table N8: Assignment rates per year
offspring_year n_offspring n_assigned assignment_percentage
2010 221 14 6.3
2011 401 140 34.9
2012 510 328 64.3
2013 228 153 67.1
2014 218 118 54.1
2015 265 179 67.5
2016 330 228 69.1
2017 252 173 68.7
2018 155 100 64.5
2019 173 84 48.6
2020 162 152 93.8

Table N8: Number of potential offspring that assign to parents above the dam. Potential offspring are defined as any individual sampled (below dam, at Cougar Trap, or above dam), that are NOR.

People may be interested in just how many individuals show up at the trap, not ALL potential offspring, this will also be useful for comparing our assignmetn rates with the previously published ones (which don’t include carcass samples). We will consider this in greater detail in the section Recycling, but let’s make revised table split by source.

kable(pedigree_meta %>%
  group_by(offspring_year) %>%
    filter(offspring_type == "cougar_trap") %>%
  summarise(n = n(), assigned_n = n() - sum(mother == "none" & father == "none", na.rm = TRUE), assn_rate = (n()-sum(mother == "none" & father == "none", na.rm = TRUE))/n()) %>%
  mutate(assignment_percentage = assn_rate*100) %>%
  select(offspring_year,n_offspring = n, n_assigned = assigned_n, assignment_percentage), digits = 1, caption = "Table N8: Assignment rates per year, Cougar Only") %>%
kable_classic(full_width = F, html_font = "Arial" )
Table N8: Assignment rates per year, Cougar Only
offspring_year n_offspring n_assigned assignment_percentage
2010 221 14 6.3
2011 356 138 38.8
2012 500 326 65.2
2013 223 153 68.6
2014 191 117 61.3
2015 241 176 73.0
2016 295 212 71.9
2017 239 169 70.7
2018 120 92 76.7
2019 158 80 50.6
2020 162 152 93.8

Table N8b: Number of potential offspring sampled intially at Cougar Trap that assign to parents above the dam. Potential offspring are defined as any individual sampled (below dam, at Cougar Trap, or above dam), that are NOR.

NOTE BUG My values are a little different here than when I calculated similar ones in the parentage notebook. I msitakenly included to the precocial males in the parentage notebook. After removing these, the mean absolute difference in assignment rates is even smaller (0.7% vs 1.5%).

#getting that mean differnce in assignment rates number
(0.063-(13/221)+0.388-(136/357)+0.652-(320/500)+0.686-(151/223)+.613-(117/191)+.73-(173/240))/6

AAM: Assignments, by Offspring Year and Parent Year

Here we get more complex and split the assignment summaries according to parent year. This allows us evaluate age at maturity (AAM) and split each offspring year into age classes.

First a table with everything.

kable(pedigree_meta %>%
  filter(!(is.na(parent_year))) %>% # exclude non-assigments
  mutate(age = as.numeric(offspring_year) - as.numeric(parent_year)) %>%
  group_by(offspring_year, age) %>%
  summarise(n = n()) %>%
  mutate(percent = 100*(n/sum(n))), digits = 0, caption = "Table N9: Age Structure, by Offspring Year") %>%
  kable_classic(full_width = F, html_font = "Arial") %>%
  kable_styling(fixed_thead = T) %>%
  scroll_box(height = "400px")
Table N9: Age Structure, by Offspring Year
offspring_year age n percent
2010 3 14 100
2011 3 5 4
2011 4 135 96
2012 3 2 1
2012 4 151 46
2012 5 175 53
2013 3 2 1
2013 4 55 36
2013 5 89 58
2013 6 7 5
2014 3 4 3
2014 4 57 48
2014 5 55 47
2014 6 2 2
2015 3 5 3
2015 4 140 78
2015 5 32 18
2015 6 2 1
2016 3 2 1
2016 4 83 36
2016 5 140 61
2016 6 3 1
2017 3 1 1
2017 4 92 53
2017 5 80 46
2018 3 3 3
2018 4 41 41
2018 5 53 53
2018 6 3 3
2019 3 2 2
2019 4 47 56
2019 5 31 37
2019 6 4 5
2020 4 133 88
2020 5 19 12

Table N9: Age of offspring in each offspring year.

Let’s also present this as a figure.

aam_data <- pedigree_meta %>%
  mutate(parent_year = (coalesce(father_year, mother_year))) %>%
  filter(!(is.na(parent_year))) %>%
  mutate(age = as.numeric(offspring_year) - as.numeric(parent_year)) %>%
  mutate(age = as.factor(age), offspring_year = as.factor(offspring_year)) 


ggplot(data = aam_data) + 
  geom_bar(aes(offspring_year, fill = age), position = position_dodge(preserve = 'single'))+scale_fill_viridis_d(name = "Age")+scale_color_viridis_d(name = "Age")+theme_bw()+xlab("Offspring Year")+scale_x_discrete(labels = c("2010*", "2011*", "2012*", "2013", "2014", "2015", "2016", "2017", "2018", "2019", "2020"))

Figure N1 Number of age 3, 4, 5 and 6 individuals in each offspring year.
* note earliest parent year in dataset is 2007, so only offpsring years from 2013 and later could possibly have all offspring ages

Averages
Finally let’s produce a summary table of age structure, averaged across all years where we can assign ages for all offspring (2013 - 2020). This will come in handy later.

kable(aam_data %>%
  filter(!(offspring_year %in% c("2010" ,"2011", "2012"))) %>%
  count(age) %>%
  summarise(age = age, n = n, proportion = n / sum(n)), caption = "Table N10: Mean Age Structure", align = "c", digits = 3,) %>%
kable_classic(full_width = F, html_font = "Arial" )
Table N10: Mean Age Structure
age n proportion
3 19 0.016
4 648 0.546
5 499 0.420
6 21 0.018

Table N10: Mean age structure of returning NOR offspring from 2013-2020. This includes all years we have the data to assign to parents in 3, 4, 5, and 6 years prior.

Age 6 offspring compose just ~2% of TLF, but age 5 offspring compose 42%. So we can probably comfortably use our dataset to describe TLF for 2007 - 2015 parent years. Fitness of parents in our pedigree (up to 2020 offspring year) is likely to strongly underestimate 2016 and 2017 TLF.

Assignments, by Offspring Year and Parent Type/Year

Here we summarise assignment of offspring to different parent types and year.

Each below table represents a single pair of offspring and parent years. The type of offspring is listed in the first column and the results are split between parent types along the remaining columns (e.g. same format as table 3 from NSNT report)

When the “type” of parent varies between two parents, the female parent type is listed first. For example, if an offspring is assigned to a reintroduced mother and carcass father, the column would be called “reintro/carcass.”

Note that some tables are missing e.g. 2017 offspring assigned to 2011 parents. This is because there are no such assignments in the pedigree, or only a single assigment.

# the format of the table in the report is difficult to produce in r
# let's not change the format of the table, instead we'll write some helper functions for filling it out 

#let's add a column for the type of assignment, and one for combined types
pedigree_meta %<>%
  mutate(assn_type = case_when((mother == "none" & father == "none") ~ "none",
                               (mother == "none" & father != "none") ~ "male_only",
                               (mother != "none" & father == "none") ~ "female_only",
                               (mother != "none" & father != "none") ~ "pair",)) %>%
  mutate(parent_type = case_when((father_type == mother_type) ~ father_type,
                                 (is.na(father_type) & !(is.na(mother_type))) ~ mother_type,
                                  (is.na(mother_type) & !(is.na(father_type))) ~ father_type,
                                   (father_type != mother_type) ~ paste(mother_type, father_type, sep = "/")))

# function
t4_helper <- function(p_year, off_year){pedigree_meta %>%
  filter(offspring_year == off_year) %>%
  filter(parent_year == p_year) %>%
  mutate(parent_type = as.factor(parent_type)) %>%
  select(offspring_type, parent_type, assn_type) %>%
  tbl_strata(
    strata = parent_type,
    .tbl_fun = ~ .x %>%
      tbl_summary( by = assn_type, percent = NULL)
  ) %>%
  modify_caption(paste(paste("parent year: ", p_year), paste("offspring year: ", off_year), sep = "  ,")) %>%
  as_kable_extra() %>%
  kable_classic(full_width = F, html_font = "Arial")
}

Note that we’re only presenting these tables for the new offspring years.

#t4 helper function example

t4_helper("2010", "2016")
parent year: 2010 ,offspring year: 2016
cougar_trap
hatchery_outplant
Characteristic male_only, N = 1 female_only, N = 2
offspring_type
sgs 1 (100%) 2 (100%)
1 n (%)
t4_helper("2011", "2016")
parent year: 2011 ,offspring year: 2016
cougar_trap
cougar_trap/hatchery_outplant
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic female_only, N = 10 male_only, N = 15 pair, N = 31 pair, N = 7 female_only, N = 15 male_only, N = 8 pair, N = 10 pair, N = 44
offspring_type
cougar_trap 10 (100%) 14 (93%) 30 (97%) 7 (100%) 14 (93%) 7 (88%) 10 (100%) 42 (95%)
sgs 0 (0%) 1 (6.7%) 1 (3.2%) 1 (6.7%) 1 (12%) 0 (0%) 2 (4.5%)
1 n (%)
t4_helper("2012", "2016")
parent year: 2012 ,offspring year: 2016
cougar_trap
cougar_trap/hatchery_outplant
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic female_only, N = 4 male_only, N = 7 pair, N = 18 pair, N = 17 female_only, N = 1 male_only, N = 3 pair, N = 3 pair, N = 30
offspring_type
cougar_trap 2 (50%) 6 (86%) 17 (94%) 17 (100%) 1 (100%) 3 (100%) 3 (100%) 29 (97%)
sgs 2 (50%) 1 (14%) 1 (5.6%) 1 (3.3%)
1 n (%)
t4_helper("2013", "2016")
parent year: 2013 ,offspring year: 2016
cougar_trap
hatchery_outplant
Characteristic male_only, N = 1 male_only, N = 1
offspring_type
sgs 1 (100%) 1 (100%)
1 n (%)
# t4_helper("2011", "2017") none so this breaks
t4_helper("2012", "2017")
parent year: 2012 ,offspring year: 2017
cougar_trap
cougar_trap/hatchery_outplant
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic female_only, N = 8 male_only, N = 14 pair, N = 24 pair, N = 4 female_only, N = 5 male_only, N = 3 pair, N = 2 pair, N = 20
offspring_type
cougar_trap 8 (100%) 12 (86%) 23 (96%) 4 (100%) 5 (100%) 3 (100%) 2 (100%) 20 (100%)
sgs 0 (0%) 2 (14%) 1 (4.2%)
1 n (%)
t4_helper("2013", "2017")
parent year: 2013 ,offspring year: 2017
cougar_trap
cougar_trap/hatchery_outplant
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic female_only, N = 9 male_only, N = 7 pair, N = 15 pair, N = 9 female_only, N = 14 male_only, N = 8 pair, N = 15 pair, N = 15
offspring_type
cougar_trap 9 (100%) 7 (100%) 15 (100%) 9 (100%) 14 (100%) 8 (100%) 15 (100%) 14 (93%)
sgs 1 (6.7%)
1 n (%)
# t4_helper("2014", "2017") only one so this breaks

t4_helper("2012", "2018")
parent year: 2012 ,offspring year: 2018
cougar_trap
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic male_only, N = 1 male_only, N = 1 pair, N = 1
offspring_type
cougar_trap 1 (100%) 1 (100%) 1 (100%)
1 n (%)
t4_helper("2013", "2018")
parent year: 2013 ,offspring year: 2018
cougar_trap
cougar_trap/hatchery_outplant
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic female_only, N = 1 male_only, N = 1 pair, N = 22 pair, N = 3 female_only, N = 2 male_only, N = 1 pair, N = 5 pair, N = 18
offspring_type
cougar_trap 0 (0%) 0 (0%) 22 (100%) 2 (67%) 1 (50%) 0 (0%) 5 (100%) 17 (94%)
sgs 1 (100%) 1 (100%) 0 (0%) 1 (33%) 1 (50%) 1 (100%) 0 (0%) 1 (5.6%)
1 n (%)
t4_helper("2014", "2018")
parent year: 2014 ,offspring year: 2018
cougar_trap
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic female_only, N = 1 male_only, N = 1 pair, N = 6 female_only, N = 3 male_only, N = 5 pair, N = 15 pair, N = 10
offspring_type
cougar_trap 1 (100%) 1 (100%) 6 (100%) 3 (100%) 5 (100%) 15 (100%) 9 (90%)
sgs 1 (10%)
1 n (%)
t4_helper("2015", "2018")
parent year: 2015 ,offspring year: 2018
cougar_trap
hatchery_outplant
Characteristic male_only, N = 1 female_only, N = 1 male_only, N = 1
offspring_type
cougar_trap 1 (100%) 0 (0%) 1 (100%)
sgs 1 (100%) 0 (0%)
1 n (%)
t4_helper("2013", "2019")
parent year: 2013 ,offspring year: 2019
cougar_trap
hatchery_outplant
Characteristic male_only, N = 2 male_only, N = 2
offspring_type
sgs 2 (100%) 1 (50%)
cougar_trap 1 (50%)
1 n (%)
t4_helper("2014", "2019")
parent year: 2014 ,offspring year: 2019
cougar_trap
cougar_trap/hatchery_outplant
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic male_only, N = 2 pair, N = 1 pair, N = 5 female_only, N = 4 male_only, N = 4 pair, N = 7 pair, N = 8
offspring_type
cougar_trap 2 (100%) 1 (100%) 5 (100%) 4 (100%) 4 (100%) 7 (100%) 8 (100%)
1 n (%)
t4_helper("2015", "2019")
parent year: 2015 ,offspring year: 2019
cougar_trap
cougar_trap/hatchery_outplant
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic pair, N = 4 pair, N = 6 female_only, N = 7 male_only, N = 2 pair, N = 16 pair, N = 12
offspring_type
cougar_trap 4 (100%) 6 (100%) 7 (100%) 1 (50%) 16 (100%) 12 (100%)
sgs 0 (0%) 1 (50%) 0 (0%)
1 n (%)
t4_helper("2016", "2019")
parent year: 2016 ,offspring year: 2019
cougar_trap
hatchery_outplant
Characteristic male_only, N = 1 female_only, N = 1
offspring_type
cougar_trap 1 (100%) 1 (100%)
1 n (%)
# t4_helper("2014", "2020") none so this breaks
t4_helper("2015", "2020")
parent year: 2015 ,offspring year: 2020
cougar_trap
cougar_trap/hatchery_outplant
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic male_only, N = 1 pair, N = 4 pair, N = 1 pair, N = 6 pair, N = 7
offspring_type
cougar_trap 1 (100%) 4 (100%) 1 (100%) 6 (100%) 7 (100%)
1 n (%)
t4_helper("2016", "2020")
parent year: 2016 ,offspring year: 2020
cougar_trap
cougar_trap/hatchery_outplant
hatchery_outplant
hatchery_outplant/cougar_trap
Characteristic female_only, N = 2 male_only, N = 7 pair, N = 14 pair, N = 12 female_only, N = 6 male_only, N = 2 pair, N = 47 pair, N = 43
offspring_type
cougar_trap 2 (100%) 7 (100%) 14 (100%) 12 (100%) 6 (100%) 2 (100%) 47 (100%) 43 (100%)
1 n (%)
# t4_helper("2017", "2020") none so this breaks

Recycling

In this section we make the assumption that offspring that do not assign to any candidate parent are not offspring of salmon above the dam, and are instead NOR immigrants (i.e. strays) from the mainstem of SFMK below dam populations.

Under this assumption we can evaluate the efficacy of the recycling or LSDR approach at excluding NOR immigrants from being released above the dam. We exclude offspring year 2010 - 2012, since we do not have all of the potential parents from these offspring years included as candidate parents in our parentage analysis.

note that this section previously used the dedup dataset, which only considers the final encounter with a fish, not the first in the case of fish that return to the trap two times, but is this really the best way to go? arguably, we should use the first. This has been corrected in this version of the notebook

GLM

If we think about this for a while the essential question being asked is:

On a given day, what is the probability that an NOR individual that arrives at the Cougar trap is not assigned to a parent above the dam?

We term this probability “probability of immigrant”. We can model “probability of immigrant” as the response variable in a binomial GLM, using the explanatory variable julian date, and (optionally) including random effect of year. We are also interested in the effect of sex here. It is covariate that we expect will reduce overdispersion in the model fit because we expect the relationship between straying and date to depend on sex.

We could get a lot more detailed with our modeling here, but let’s attempt to model probability of immigrant as a function of julian day and sex.

First let’s plot make some simple plots.

dedup_first_encounter <- full_data_1.0 %>%
  group_by(sample_id) %>%
  slice_min(date, n = 1, with_ties = FALSE) %>%
  ungroup() %>%
  rename(date_first = date)


strays <- pedigree_meta %>% 
  filter(offspring_year > 2012, offspring_type == "cougar_trap") %>%
  left_join(select(dedup_first_encounter, sample_id, sex, date_first), by = c("offspring_sample_id" = "sample_id")) %>%
  mutate(immigrant = case_when(assn_type == "none" ~ "immigrant",
                               TRUE ~ "return"),
         immigrant_i = case_when(assn_type == "none" ~ 1,
                               TRUE ~ 0),
         jday = yday(ymd(date_first)))

ggplot(strays, aes(jday, immigrant_i))+geom_count(alpha = 0.5)+geom_smooth(method = "glm", method.args=list(family="binomial"))+theme_classic()+xlab("Julian Day")+ylab("Immigrant\n1 = immigrant 0 = return")+scale_color_viridis_d()+geom_vline(aes(xintercept = 245), linetype = "dashed", color  = "red")

Figure N2: Julian day vs immigrant count at Cougar Trap, including only individuals collected for the first time. An offspring with no assigned parentage is considered an immigrant (scored as 1), an offpsring with at least one parent is considered a return (scored as zero). Circle size corresponds to number of individuals that day. Offspring years from 2013-2020 are considered. Smoothing curve is a simple binomial glm on the immigrant variable using only the effect of Julian day. Julian day 245 is on or one day from the September 1st cutoff used for LSDR. It is displayed as vertical dashed red line.

Here we see there is a clear relationship between the propensity for an individual to be assigned a parent from above the dam and the day it is sampled at the Cougar Trap.

We also know that the overall assignemnt rate varies from year to year. Let’s see if the relationship between Julian day and the propensity for an immigrant to be sampled at the Cougar trap depends on year.

ggplot(strays, aes(jday, immigrant_i, color = as.factor(offspring_year)))+geom_count(alpha = 0.5)+geom_smooth(method = "glm", method.args=list(family="binomial"))+theme_classic()+xlab("Julian Day")+ylab("Immigrant?\n1 = immigrant \n 0 = return")+scale_color_muted(name = "Offspring Year")+geom_vline(aes(xintercept = 245), linetype = "dashed", color  = "red")

Figure N3: Julian day vs immigrant count at Cougar Trap, by offspring year. Only first observation of individual is used. An offspring with no assigned parentage is considered an immigrant (scored as 1), an offpsring with at least one parent is considered a return (scored as zero). Circle size corresponds to number of individuals that day. Offspring years from 2013-2020 are considered. Smoothing curve is a simple binomial glm on the immigrant variable using only the effect of Julian day. Julian day 245 is on or one day from the September 1st cutoff used for LSDR. It is displayed as vertical dashed red line.

Yes, while the general trend is the same, there is some variation among year. 2020 saw almost no immigrants.

Fit GLM
Now let’s fit a smple glm using sex, julian day and their interaction

strays %<>%
  mutate(immigrant = as.factor(immigrant))

strays %<>%
  mutate(offspring_year = as.factor(offspring_year))

stray_glm <- glm( immigrant ~ jday + sex +jday*sex , family = binomial, data = strays)
summary(stray_glm)
## 
## Call:
## glm(formula = immigrant ~ jday + sex + jday * sex, family = binomial, 
##     data = strays)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.4688  -0.9310   0.5247   0.7170   1.6901  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  5.927243   0.602197   9.843  < 2e-16 ***
## jday        -0.023242   0.002834  -8.200 2.41e-16 ***
## sexM         1.268583   0.777978   1.631   0.1030    
## jday:sexM   -0.006903   0.003590  -1.923   0.0545 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1969.6  on 1625  degrees of freedom
## Residual deviance: 1648.1  on 1622  degrees of freedom
##   (3 observations deleted due to missingness)
## AIC: 1656.1
## 
## Number of Fisher Scoring iterations: 4
drop1(stray_glm, test = "Chisq")

The interaction between the effect of julian day and sex is insignificant (p = 0.0545, likelihood ratio test). So let’s drop it and fit again. We will consider both model fits.

stray_glm2 <- glm( immigrant ~ jday + sex , family = binomial, data = strays)
summary(stray_glm2)
## 
## Call:
## glm(formula = immigrant ~ jday + sex, family = binomial, data = strays)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.6095  -0.9569   0.5102   0.7337   1.6269  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  6.855777   0.380413  18.022   <2e-16 ***
## jday        -0.027658   0.001732 -15.973   <2e-16 ***
## sexM        -0.208466   0.129765  -1.606    0.108    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 1969.6  on 1625  degrees of freedom
## Residual deviance: 1651.7  on 1623  degrees of freedom
##   (3 observations deleted due to missingness)
## AIC: 1657.7
## 
## Number of Fisher Scoring iterations: 4
drop1(stray_glm2, test = "Chisq")

When we ignore the interaction,we find the effect of day is strongly significant, both by Wald tests and likelihood ratio tests.

stray_glm2 <- glmer( immigrant ~ jday*sex+ (1|offspring_year) , family = binomial, data = strays)
summary(stray_glm2)
drop1(stray_glm2, test = "Chisq")

GLM Interpretation
The response variable is coded with immigrant as the first level and return as the second. Therefore a negative estimated effect of an explanatory variable means that that variable increases the propensity for an immigrant at the trap. (It might be worth releveling the “immigrant” variable factor to make effect estimates and figures consistent with the “probability of immigrant” language used earlier)

So, to interpet the model with the interaction, later in the season there is a significant increase in the proability of immigrants arriving at the trap, and this effect is stronger for males than females. Once you control for julian date, there is no greater propensity for male immigrants to arrive a the trap than for female immigrants.

Let’s also plot this model.

require(effects)

#let's not use the effect package plotting tools and just get the effects

#eff1 <- predictorEffect("sex_ratio_y_l", final_model, focal.levels = seq(-0.4,0.7,by = 0.1))
eff1 <- predictorEffect("jday", stray_glm)
effdf <- as.data.frame(eff1)


ggplot(data = effdf, aes(x = (jday), y = fit, color = sex))+ 
  geom_line()+geom_smooth( aes(ymin = lower, ymax = upper, fill = sex, colour = sex), stat = "identity") +
  theme_bw()+ylab("1 - Predicted Probability of Immigrant") + scale_color_manual(labels = c("Female", "Male"), name = "Sex", values = c("#228833", "#AA3377")) + scale_fill_manual(labels = c("Female", "Male"), name = "Sex", values = c("#228833", "#AA3377"))+xlab("Julian Day")+geom_vline(aes(xintercept = 245), linetype = "dashed", color  = "red")

Figure N4: Predicted probability that an individual arriving at Cougar Trap assigns to a parent released above the Dam vs Julian Day. Vertical red dashed line corresponds to Julian Day 245, on or near September 1st.

Summary
This is somewhat preliminary. We still should do model validation and think about how to incorporate other covariates or whether or not to include random effect of year.

However, it seems like there is a strong relationship between Julian Day the probability that an NOR individual captured at the Cougar trap is an immigrant (doesn’t assign to a parent above the trap). Moreover, there is a weak and marginally significant sex specific pattern. The same proportion of males and females are likely to be immigrants overall, however later in the season males are more likely to be immigrants. These results fit with our biological expectations.

With respect to the broader question, the efficacy of LSDR and or recycling, it seems that relatively few (<20%) of NOR salmon sampled at the trap before around July 1st (Julian day 182, most years) are immigrants. By September 1st, the date previously used as the cutoff for LSDR, about 45% of females and about 55% of males are predicted to be immigrants. However, it should be noted that these results are highly variable among years.

These model predictions using data from return years 2013-2020 are somewhat different from the results reported in 2013, 2014 and 2015 using either the new pedigrees (figure N2) or the pedigrees used to produce previous reports (Banks 2014 and Sard 2016). In Sard 2016, the authors found that during 2013 ~78% of NOR salmon sampled at the Cougar Trap after September 1st were likely immigrants. In Banks 2016, the authors found similar results for offspring years 2014 and 2015 (74% and 77%, respectively). These values are in close agreement with those from the newly inferred pedigrees. In one new offspring year for this report (2019), we oberved a similarly high immigrant after September 1st, but it was lower in all other years (2016, 2017, 2018 and 2020).

GLMM

We can see in the annual plots that not only is there differences between years in the overall rate of immigrants but in the rate of immigrants over time. Therefore we may need to incorporate a random slope. Let’s compete two random effects models to see best how to incorporate year.

stray_glmm1 <- glmmTMB( immigrant ~ jday + sex +jday*sex + (jday|offspring_year) , family = binomial, data = strays, REML = TRUE)

stray_glmm2 <- glmmTMB( immigrant ~ jday + sex +jday*sex + (1|offspring_year) , family = binomial, data = strays, REML = TRUE)

stray_glmm3 <- glmmTMB( immigrant ~ jday + sex +jday*sex  , family = binomial, data = strays, REML = TRUE)
## Warning in fitTMB(TMBStruc): Model convergence problem; extreme or very small
## eigenvalues detected. See vignette('troubleshooting')
AIC(stray_glmm1 , stray_glmm2, stray_glmm3 )
anova(stray_glmm1 , stray_glmm2, stray_glmm3)

Yes the model fit is WAY better with a random slopes effect. Now let’s refit with ML and do model validation and selection

stray_glmm1 <- glmmTMB( immigrant ~ jday + sex +jday*sex + (jday|offspring_year) , family = binomial, data = strays)

drop1(stray_glmm1, test = "Chisq")
summary(stray_glmm1)
##  Family: binomial  ( logit )
## Formula:          immigrant ~ jday + sex + jday * sex + (jday | offspring_year)
## Data: strays
## 
##      AIC      BIC   logLik deviance df.resid 
##   1579.6   1617.4   -782.8   1565.6     1619 
## 
## Random effects:
## 
## Conditional model:
##  Groups         Name        Variance  Std.Dev. Corr  
##  offspring_year (Intercept) 4.814e+00 2.194164       
##                 jday        8.496e-05 0.009218 -0.95 
## Number of obs: 1626, groups:  offspring_year, 8
## 
## Conditional model:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  6.185427   1.043995   5.925 3.13e-09 ***
## jday        -0.023940   0.004642  -5.157 2.50e-07 ***
## sexM         1.216413   0.792222   1.535   0.1247    
## jday:sexM   -0.006948   0.003677  -1.890   0.0588 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interaction not significant (marginally), let’s fit again.

stray_glmm1 <- glmmTMB( immigrant ~ jday + sex + (jday|offspring_year) , family = binomial, data = strays)

drop1(stray_glmm1, test = "Chisq")
summary(stray_glmm1)
##  Family: binomial  ( logit )
## Formula:          immigrant ~ jday + sex + (jday | offspring_year)
## Data: strays
## 
##      AIC      BIC   logLik deviance df.resid 
##   1581.2   1613.5   -784.6   1569.2     1620 
## 
## Random effects:
## 
## Conditional model:
##  Groups         Name        Variance  Std.Dev. Corr  
##  offspring_year (Intercept) 4.883e+00 2.209639       
##                 jday        8.514e-05 0.009227 -0.95 
## Number of obs: 1626, groups:  offspring_year, 8
## 
## Conditional model:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  7.132594   0.929650   7.672 1.69e-14 ***
## jday        -0.028465   0.004012  -7.094 1.30e-12 ***
## sexM        -0.261477   0.135490  -1.930   0.0536 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Hmm, intercepts and slopes are pretty strongly correlated. This makes sense biologically, but it could also be a convergence issue.

require(effects)

#let's not use the effect package plotting tools and just get the effects

#eff1 <- predictorEffect("sex_ratio_y_l", final_model, focal.levels = seq(-0.4,0.7,by = 0.1))
eff1 <- predictorEffect(c("jday"), stray_glmm1)
effdf <- as.data.frame(eff1)


#ggplot(data = effdf, aes(x = (jday), y = fit))+ 
#  geom_line()+geom_smooth( aes(ymin = lower, ymax = upper), stat = "identity") +
#  theme_bw()+ylab("1 - Predicted Probability of Immigrant") +xlab("Julian Day")+geom_vline(aes(xintercept = 245), linetype = "dashed", color  = "red")

strays_plot_data <- strays %>%
  select(immigrant, jday,  sex , offspring_year) %>%
  drop_na()
strays_plot_data$predict <- predict(stray_glmm1, type="response")

#ggplot()+ geom_smooth(data = strays_plot_data, aes(x = jday, y = predict, color = offspring_year), size = 0.5 , se = FALSE) + geom_line(data = effdf, aes(x = (jday), y = fit))+geom_smooth(data = effdf, aes(x = (jday), ymin = lower, ymax = upper, y = fit), stat = "identity", color = "black", size = 2) +
#  theme_bw()+ylab("1 - Predicted Probability of Immigrant") +xlab("Julian Day")+geom_vline(aes(xintercept = 245), linetype = "dashed", color  = "red")+scale_color_muted(name = "Year")
  

ggplot()+ geom_smooth(data = strays, aes(x = jday, y = 1-immigrant_i, color = offspring_year, fill = offspring_year),method = "glm", method.args=list(family="binomial"), size = 0.5, alpha = 0.2 ) + geom_line(data = effdf, aes(x = (jday), y = fit))+geom_smooth(data = effdf, aes(x = (jday), ymin = lower, ymax = upper, y = fit), stat = "identity", color = "black", size = 2) +
  theme_bw()+ylab("Predicted Proportion of NOR Returns\n Produced Above Cougar Dam") +xlab("Julian Day")+geom_vline(aes(xintercept = 245), linetype = "dashed", color  = "red")+scale_color_muted(name = "Year")+scale_fill_muted(name = "Year")
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 3 rows containing non-finite values (stat_smooth).

Figure Nx: Predicted proportion of NOR returns preoduced above the dam per day, by the GLMM_immigrant. Individual years binomial smooths are also provided.

Evaluation

Let’s also evaluate the recycling program. First we will ask how many fish produced above Cougar Dam from 2015 - 2020 fail to return after being recycled downstream, and how many do successfully return? Then we will ask the same for NOR immigrants.

#here we need the full data, not the deduplicated data
recyc <- full_data_1.0 %>%
  filter(type == "cougar_trap", origin == "NOR", year > 2014) %>%
  left_join(select(pedigree_meta, offspring_sample_id, assn_type), by = c("sample_id" = "offspring_sample_id")) %>%
  mutate(immigrant = case_when(assn_type == "none" ~ "immigrant",
                               TRUE ~ "return"),
         immigrant_i = case_when(assn_type == "none" ~ 1,
                               TRUE ~ 0)) %>%
  group_by(sample_id) %>%
  mutate(returns = n()) %>%
  ungroup() %>%
  filter(returns < 3) %>% # get rid of the 3 or so individuals recorded 3 times at cougar, all have been investigated to be errors
  mutate(returns_f = as.factor(returns))

#now let's dedup, note that we need to use the first date, note the second here
recyc_dedup <- recyc %>%
  group_by(sample_id) %>%
  slice_max(date) %>%
  ungroup()

ggplot(data = recyc_dedup, aes(x = immigrant, fill = returns_f))+geom_bar( position = "stack", alpha = 0.8)+geom_text(stat='count', aes(label=..count..),position = position_stack(vjust = 0.5)) + scale_fill_manual(values = c("#228833", "#AA3377"), name = "Captures at\nCougar Trap") +theme_classic() +xlab("NOR Salmon")+scale_x_discrete(labels = c("NOR Immigrant", "Produced above Dam")  )

Next, can we test for significance.

#let's make the contigency table and make sure it is formatted correctly witha mosaic plot
#mosaicplot(data.frame("immigrant" = c(252,164),"return" = c(278, 1200)))

#ok looks good, let's run fishers exact

#fisher.test(data.frame("immigrant" = c(252,82),"return" = c(275, 600)), alternative = "g")
fisher.test(data.frame("2" = c(600,82),"1" = c(275, 252)), alternative = "g")
## 
##  Fisher's Exact Test for Count Data
## 
## data:  data.frame(`2` = c(600, 82), `1` = c(275, 252))
## p-value < 2.2e-16
## alternative hypothesis: true odds ratio is greater than 1
## 95 percent confidence interval:
##  5.216922      Inf
## sample estimates:
## odds ratio 
##    6.69333

Summary A salmon produced above the dam is 6.7 fold more likely to return a second time after being recycled downstream than an NOR immigrant (Fisher’s exact test, p < 1.0e-16). From 2015 to 2020, 875 salmon produced above the dam were collected at the Cougar Trap and recycled downstream, of these 275 (31%) never returned a second time, and 600 (81%) returned a second time and were released above the dam. Therefore the recycling program “costs” about 31% of NOR returns, but what is the benefit?

Recycling prevented the release of 252 (75%) of the 334 NOR immigrants above the dam. 82 (25%) still were released above the dam even with recycling.

LSDR How does this compare to LSDR? Specifically, during the same time frame, how would LSDR have sorted NOR immigrants and returns above and below the dam? Let’s use the same September 1st cutoff used in 2013 and 2014 and check.

# let's only use the first capture and a sept 1 cutoff

#all fish before jday 245 go above dam on first recapture = recyc2
recyc2 <- recyc %>%
  group_by(sample_id) %>%
  arrange(date) %>%
  slice_head() %>%
  ungroup() %>%
  mutate(jday = yday(date)) %>%
  filter(jday < 245)

# all fish not in recyc2 get lsdr'd = recyc3
recyc3 <- recyc_dedup %>%
  filter(!(sample_id %in% recyc2$sample_id))

# how many returns and NOR immigrants would be placed above the dam before september 1st?
recyc2 %>%
  count(immigrant)
# and how many after?
ggplot(data = recyc3, aes(x = immigrant, fill = returns_f))+geom_bar( position = "stack", alpha = 0.8)+geom_text(stat='count', aes(label=..count..),position = position_stack(vjust = 0.5)) + scale_fill_manual(values = c("#228833", "#AA3377"), name = "Captures at\nCougar Trap") +theme_classic() +xlab("Parent(s) Above the Dam")+scale_x_discrete(labels = c("No Parent\nNOR Immigrant", "At least One\nParent above Dam")  )

In total, LSDR would have resulted in 234 more NOR salmon produced above the dam being placed above the dam ((51+783 = 834) vs. 600) and would prevented 783 NOR salmon from experiencing the additional handling stress of recycling, but it would come at the cost of substantially more NOR immigrants being placed above the dam ((176+36 = 212) vs 82.

# what about release delay? how long is the average interval between first encounter and eventual release of nor salmon causedby recycling?
recyc %>% filter(returns == 2) %>% group_by(sample_id) %>% mutate(diff_date = date - lag(date)) %>% ungroup() %>% summarise(mean = mean(diff_date, na.rm = TRUE), sd = sd(diff_date, na.rm = TRUE))

Yet another consideration when evaluating the recycling program is its effect on release day. Recycling delays the eventual release above the dam by an average of 30.9 days (s.d. = 28.3). Using the effect of release day on TLF estimated by the GLMM, this is expected to reduce fitness by ~12%.

So recycling is working more or less as designed. It limits the release of NOR immigrants above the dam better than LSDR, but comes at the cost of a reduction in the number of NOR returns produced above the dam by about 28% (600 vs 834), an ~12% reduction in fitness by delaying release, and potentially an unquantified further reduction in fitness caused by additional handling.

Fitness

In this section, we calculate the number of offspring assigned to parents from the pedigree and calculate summary statistics.

Data Prep

Let’s get our first parent-focused dataframe together. This will have one row for each unqiue candidate parent as well as the number of offspring assigned to it in the pedigree.

#first let's get a dataframe that can be easily used to calculate parent level information
# all candidate parents, the number of time they appear in the pedigree and their metadata

parents <- dedup %>%
  filter(cand_parent == TRUE)

father_counts <- pedigree %>%
  group_by(father) %>%
  count() %>%
  rename(parent = father)

mother_counts <- pedigree %>%
  group_by(mother) %>%
  count() %>%
  rename(parent = mother)

parent_counts <- bind_rows(mother_counts, father_counts) 
rm(mother_counts)
rm(father_counts)

parents %<>%
  left_join(parent_counts, by = c("sample_id" = "parent")) %>%
  rename(tlf = n) %>%
  mutate(tlf = replace_na(tlf, 0))

TLF Results

Let’s put together some tables of fitness averages by groups.

Note that TLF only applies when we have ages 3, 4, 5 and 6 offspring for a year. However, our AAM results suggests that fitness using only ages 3, 4 and 5 offspring only slightly underestimates TLF (98% of returning offspring are ages 3, 4 or 5), so using returns from 2010 to 2020, we have TLF estimates for 2007-2014 parent years, a good approximation for 2015, and only partial fitness estimates for 2016 and 2017. These caveats are noted in all subsequent tables and figures

# average fitness, by parent year
kable(parents %>%
        filter(year < 2018) %>%
  group_by(year) %>%
  summarise(N = n(), mean_tlf = mean(tlf), sd_tlf = sd(tlf), range = paste(min(tlf), " -  ", max(tlf))) %>%
    mutate(year = as.character(year),
      year = case_when(year == 2015 ~ "2015*",
                       year == 2016 ~ "2016**",
                       year == 2017 ~ "2017**",
                            TRUE ~ year)), align = "c", caption = "Table N11: Total Lifetime Fitness by Parent Year", digits = 2) %>%
  kable_classic(full_width = F, html_font = "Arial")
Table N11: Total Lifetime Fitness by Parent Year
year N mean_tlf sd_tlf range
2007 745 0.79 1.53 0 - 17
2008 873 0.53 1.20 0 - 12
2009 1383 0.14 0.44 0 - 4
2010 747 0.22 0.63 0 - 6
2011 725 0.64 1.42 0 - 9
2012 947 0.31 0.68 0 - 5
2013 627 0.40 0.92 0 - 8
2014 652 0.19 0.55 0 - 4
2015* 754 0.17 0.48 0 - 6
2016** 705 0.36 0.80 0 - 8
2017** 605 0.00 0.00 0 - 0

Table N11: TLF per parent year.
* Note that 2015 estimates do not include potential year 6 offspring. However we expect these offspring to contribute very little to TLF (~2%)
** Note that 2016 and 2017 offspring do not include potential year 5 and 6 offspring, and potential year 4 5 and 6 offspring, which are expected to substantially contribute to TLF for these parents years

# average fitness, by parent year
kable(parents %>%
        filter(year < 2018, type %in% c("cougar_trap", "hatchery_outplant")) %>%
  group_by(year, type) %>%
  summarise(N = n(), mean_tlf = mean(tlf), sd_tlf = sd(tlf), range = paste(min(tlf), " -  ", max(tlf))) %>%
        mutate(year = as.character(year),
      year = case_when(year == 2015 ~ "2015*",
                       year == 2016 ~ "2016**",
                       year == 2017 ~ "2017**",
                            TRUE ~ year),
      type = case_when(type == "hatchery_outplant" ~ "hatchery",
                       type == "cougar_trap" ~ "cougar trap")) %>%
    rename(source = type,), align = "c", caption = "Table N12: Total Lifetime Fitness by Parent Year and Source", digits = 2) %>%
  kable_classic(full_width = F, html_font = "Arial")
Table N12: Total Lifetime Fitness by Parent Year and Source
year source N mean_tlf sd_tlf range
2007 hatchery 745 0.79 1.53 0 - 17
2008 hatchery 873 0.53 1.20 0 - 12
2009 hatchery 1383 0.14 0.44 0 - 4
2010 cougar trap 251 0.26 0.65 0 - 4
2010 hatchery 496 0.19 0.63 0 - 6
2011 cougar trap 385 0.74 1.54 0 - 9
2011 hatchery 340 0.54 1.26 0 - 9
2012 cougar trap 517 0.38 0.77 0 - 5
2012 hatchery 430 0.22 0.53 0 - 3
2013 cougar trap 187 0.75 1.30 0 - 8
2013 hatchery 440 0.26 0.65 0 - 5
2014 cougar trap 153 0.27 0.74 0 - 4
2014 hatchery 486 0.17 0.48 0 - 3
2015* cougar trap 154 0.29 0.69 0 - 6
2015* hatchery 600 0.14 0.41 0 - 4
2016** cougar trap 242 0.38 0.88 0 - 8
2016** hatchery 459 0.34 0.75 0 - 6
2017** cougar trap 157 0.00 0.00 0 - 0
2017** hatchery 448 0.00 0.00 0 - 0

Table N12: TLF per parent year, and parent source. Parent source refers to either*** individuals trapped at the Cougar Trap and released above the dam (“Cougar Trap”, NOR and HOR) or individuals trapped at McKEnzie or Leaburg Hatcheries and released above the dam (“Hatchery”, HOR)
* Note that 2015 estimates do not include potential year 6 offspring. However we expect these offspring to contribute very little to TLF (~2%)
** Note that 2016 and 2017 offspring do not include potential year 5 and 6 offspring, and potential year 4 5 and 6 offspring, which are expected to substantially contribute to TLF for these parents years
*** Note that there are 5 individuals sampled during spawning ground surveys above the dam and 12 precocial males sampled above the dam that were included as candidate parents. None had an offspring assigned to them and they are not presented in this table

# average fitness, by parent year
kable(parents %>%
        filter(year < 2018, type %in% c("cougar_trap", "hatchery_outplant")) %>%
  group_by(year, type, sex) %>%
  summarise(N = n(), mean_tlf = mean(tlf), sd_tlf = sd(tlf), range = paste(min(tlf), " -  ", max(tlf))) %>%
        mutate(year = as.character(year),
      year = case_when(year == 2015 ~ "2015*",
                       year == 2016 ~ "2016**",
                       year == 2017 ~ "2017**",
                            TRUE ~ year),
      type = case_when(type == "hatchery_outplant" ~ "hatchery",
                       type == "cougar_trap" ~ "cougar trap")) %>%
    rename(source = type,), align = "c", caption = "Table N13: Total Lifetime Fitness by Parent Year, Source and Sex", digits = 2) %>%
  kable_classic(full_width = F, html_font = "Arial")
Table N13: Total Lifetime Fitness by Parent Year, Source and Sex
year source sex N mean_tlf sd_tlf range
2007 hatchery F 318 0.89 1.49 0 - 11
2007 hatchery M 427 0.72 1.56 0 - 17
2008 hatchery F 288 0.80 1.46 0 - 12
2008 hatchery M 585 0.40 1.03 0 - 9
2009 hatchery F 603 0.16 0.47 0 - 4
2009 hatchery M 780 0.13 0.43 0 - 3
2010 cougar trap F 62 0.23 0.61 0 - 3
2010 cougar trap M 189 0.27 0.66 0 - 4
2010 hatchery F 201 0.29 0.74 0 - 5
2010 hatchery M 295 0.13 0.53 0 - 6
2011 cougar trap F 150 0.66 1.30 0 - 7
2011 cougar trap M 235 0.79 1.67 0 - 9
2011 hatchery F 170 0.78 1.59 0 - 9
2011 hatchery M 170 0.29 0.73 0 - 4
2012 cougar trap F 191 0.40 0.76 0 - 5
2012 cougar trap M 326 0.36 0.78 0 - 5
2012 hatchery F 248 0.25 0.58 0 - 3
2012 hatchery M 182 0.18 0.46 0 - 2
2013 cougar trap F 86 0.69 1.20 0 - 6
2013 cougar trap M 101 0.80 1.38 0 - 8
2013 hatchery F 239 0.29 0.71 0 - 5
2013 hatchery M 201 0.22 0.57 0 - 3
2014 cougar trap F 59 0.24 0.75 0 - 4
2014 cougar trap M 94 0.30 0.73 0 - 3
2014 hatchery F 327 0.14 0.45 0 - 3
2014 hatchery M 159 0.23 0.54 0 - 3
2015* cougar trap F 53 0.28 0.91 0 - 6
2015* cougar trap M 101 0.29 0.55 0 - 3
2015* hatchery F 412 0.12 0.35 0 - 2
2015* hatchery M 188 0.17 0.51 0 - 4
2016** cougar trap F 86 0.33 0.58 0 - 2
2016** cougar trap M 156 0.42 1.00 0 - 8
2016** hatchery F 315 0.31 0.66 0 - 5
2016** hatchery M 144 0.42 0.93 0 - 6
2017** cougar trap F 45 0.00 0.00 0 - 0
2017** cougar trap M 112 0.00 0.00 0 - 0
2017** hatchery F 325 0.00 0.00 0 - 0
2017** hatchery M 123 0.00 0.00 0 - 0

Table N13: TLF per parent year, parent source and sex. Parent source refers to either*** individuals trapped at the Cougar Trap and released above the dam (“Cougar Trap”, NOR and HOR) or individuals trapped at McKEnzie or Leaburg Hatcheries and released above the dam (“Hatchery”, HOR)
* Note that 2015 estimates do not include potential year 6 offspring. However we expect these offspring to contribute very little to TLF (~2%)
** Note that 2016 and 2017 offspring do not include potential year 5 and 6 offspring, and potential year 4 5 and 6 offspring, which are expected to substantially contribute to TLF for these parents years
*** Note that there are 5 individuals sampled during spawning ground surveys above the dam and 12 precocial males sampled above the dam that were included as candidate parents. None had an offspring assigned to them and they are not presented in this table

# average fitness, by parent year
kable(parents %>%
        filter(year < 2018, type %in% c("cougar_trap", "hatchery_outplant")) %>%
  group_by(year, origin, sex) %>%
  summarise(N = n(), mean_tlf = mean(tlf), sd_tlf = sd(tlf), range = paste(min(tlf), " -  ", max(tlf))) %>%
        mutate(year = as.character(year),
      year = case_when(year == 2015 ~ "2015*",
                       year == 2016 ~ "2016**",
                       year == 2017 ~ "2017**",
                            TRUE ~ year)
      ) %>%
    pivot_wider(id_cols = c(year, sex), names_from = origin, values_from = c(N, mean_tlf, sd_tlf, range) ) %>%
    select(year, sex, N_HOR, mean_tlf_HOR, sd_tlf_HOR, range_HOR, N_NOR, mean_tlf_NOR, sd_tlf_NOR, range_NOR), align = "c", caption = "Table N13c: Total Lifetime Fitness by Parent Year, Origin, and Sex", digits = 2) %>%
  kable_classic(full_width = F, html_font = "Arial")
Table N13c: Total Lifetime Fitness by Parent Year, Origin, and Sex
year sex N_HOR mean_tlf_HOR sd_tlf_HOR range_HOR N_NOR mean_tlf_NOR sd_tlf_NOR range_NOR
2007 F 318 0.89 1.49 0 - 11
2007 M 427 0.72 1.56 0 - 17
2008 F 288 0.80 1.46 0 - 12
2008 M 585 0.40 1.03 0 - 9
2009 F 603 0.16 0.47 0 - 4
2009 M 780 0.13 0.43 0 - 3
2010 F 206 0.29 0.73 0 - 5 57 0.25 0.63 0 - 3
2010 M 320 0.13 0.52 0 - 6 164 0.29 0.69 0 - 4
2011 F 176 0.76 1.57 0 - 9 144 0.68 1.33 0 - 7
2011 M 193 0.27 0.69 0 - 4 212 0.86 1.74 0 - 9
2012 F 256 0.25 0.58 0 - 3 183 0.41 0.77 0 - 5
2012 M 191 0.18 0.46 0 - 2 317 0.37 0.78 0 - 5
2013 F 248 0.28 0.70 0 - 5 77 0.75 1.25 0 - 6
2013 M 207 0.21 0.56 0 - 3 95 0.85 1.41 0 - 8
2014 F 334 0.14 0.45 0 - 3 52 0.27 0.79 0 - 4
2014 M 172 0.23 0.57 0 - 3 80 0.30 0.72 0 - 3
2015* F 417 0.12 0.35 0 - 2 48 0.31 0.95 0 - 6
2015* M 202 0.17 0.50 0 - 4 87 0.31 0.58 0 - 3
2016** F 336 0.30 0.64 0 - 5 64 0.39 0.63 0 - 2
2016** M 193 0.35 0.84 0 - 6 107 0.54 1.16 0 - 8
2017** F 328 0.00 0.00 0 - 0 42 0.00 0.00 0 - 0
2017** M 125 0.00 0.00 0 - 0 109 0.00 0.00 0 - 0

Table N13b: TLF per parent year, origin and sex. Note NA origin individuals are excluded. * Note that 2015 estimates do not include potential year 6 offspring. However we expect these offspring to contribute very little to TLF (~2%)
** Note that 2016 and 2017 offspring do not include potential year 5 and 6 offspring, and potential year 4 5 and 6 offspring, which are expected to substantially contribute to TLF for these parents years
*** Note that there are 5 individuals sampled during spawning ground surveys above the dam and 12 precocial males sampled above the dam that were included as candidate parents. None had an offspring assigned to them and they are not presented in this table

# average fitness, by parent year
kable(parents %>%
        filter(year < 2018, type %in% c("cougar_trap", "hatchery_outplant")) %>%
  group_by(year, type, origin, sex) %>%
  summarise(N = n(), mean_tlf = mean(tlf), sd_tlf = sd(tlf), range = paste(min(tlf), " -  ", max(tlf))) %>%
        mutate(year = as.character(year),
      year = case_when(year == 2015 ~ "2015*",
                       year == 2016 ~ "2016**",
                       year == 2017 ~ "2017**",
                            TRUE ~ year),
      type = case_when(type == "hatchery_outplant" ~ "hatchery",
                       type == "cougar_trap" ~ "cougar trap")) %>%
    rename(source = type,), align = "c", caption = "Table N13c: Total Lifetime Fitness by Parent Year, Source Origin, and Sex", digits = 2) %>%
  kable_classic(full_width = F, html_font = "Arial")
Table N13c: Total Lifetime Fitness by Parent Year, Source Origin, and Sex
year source origin sex N mean_tlf sd_tlf range
2007 hatchery HOR F 318 0.89 1.49 0 - 11
2007 hatchery HOR M 427 0.72 1.56 0 - 17
2008 hatchery HOR F 288 0.80 1.46 0 - 12
2008 hatchery HOR M 585 0.40 1.03 0 - 9
2009 hatchery HOR F 603 0.16 0.47 0 - 4
2009 hatchery HOR M 780 0.13 0.43 0 - 3
2010 cougar trap HOR F 5 0.00 0.00 0 - 0
2010 cougar trap HOR M 25 0.16 0.37 0 - 1
2010 cougar trap NOR F 57 0.25 0.63 0 - 3
2010 cougar trap NOR M 164 0.29 0.69 0 - 4
2010 hatchery HOR F 201 0.29 0.74 0 - 5
2010 hatchery HOR M 295 0.13 0.53 0 - 6
2011 cougar trap HOR F 6 0.17 0.41 0 - 1
2011 cougar trap HOR M 23 0.13 0.34 0 - 1
2011 cougar trap NOR F 144 0.68 1.33 0 - 7
2011 cougar trap NOR M 212 0.86 1.74 0 - 9
2011 hatchery HOR F 170 0.78 1.59 0 - 9
2011 hatchery HOR M 170 0.29 0.73 0 - 4
2012 cougar trap HOR F 8 0.25 0.46 0 - 1
2012 cougar trap HOR M 9 0.11 0.33 0 - 1
2012 cougar trap NOR F 183 0.41 0.77 0 - 5
2012 cougar trap NOR M 317 0.37 0.78 0 - 5
2012 hatchery HOR F 248 0.25 0.58 0 - 3
2012 hatchery HOR M 182 0.18 0.46 0 - 2
2013 cougar trap HOR F 9 0.11 0.33 0 - 1
2013 cougar trap HOR M 6 0.00 0.00 0 - 0
2013 cougar trap NOR F 77 0.75 1.25 0 - 6
2013 cougar trap NOR M 95 0.85 1.41 0 - 8
2013 hatchery HOR F 239 0.29 0.71 0 - 5
2013 hatchery HOR M 201 0.22 0.57 0 - 3
2014 cougar trap HOR F 7 0.00 0.00 0 - 0
2014 cougar trap HOR M 13 0.31 0.85 0 - 3
2014 cougar trap NOR F 52 0.27 0.79 0 - 4
2014 cougar trap NOR M 80 0.30 0.72 0 - 3
2014 cougar trap M 1 0.00 0 - 0
2014 hatchery HOR F 327 0.14 0.45 0 - 3
2014 hatchery HOR M 159 0.23 0.54 0 - 3
2015* cougar trap HOR F 5 0.00 0.00 0 - 0
2015* cougar trap HOR M 14 0.14 0.36 0 - 1
2015* cougar trap NOR F 48 0.31 0.95 0 - 6
2015* cougar trap NOR M 87 0.31 0.58 0 - 3
2015* hatchery HOR F 412 0.12 0.35 0 - 2
2015* hatchery HOR M 188 0.17 0.51 0 - 4
2016** cougar trap HOR F 21 0.14 0.36 0 - 1
2016** cougar trap HOR M 49 0.14 0.41 0 - 2
2016** cougar trap NOR F 64 0.39 0.63 0 - 2
2016** cougar trap NOR M 107 0.54 1.16 0 - 8
2016** cougar trap F 1 0.00 0 - 0
2016** hatchery HOR F 315 0.31 0.66 0 - 5
2016** hatchery HOR M 144 0.42 0.93 0 - 6
2017** cougar trap HOR F 3 0.00 0.00 0 - 0
2017** cougar trap HOR M 2 0.00 0.00 0 - 0
2017** cougar trap NOR F 42 0.00 0.00 0 - 0
2017** cougar trap NOR M 109 0.00 0.00 0 - 0
2017** cougar trap M 1 0.00 0 - 0
2017** hatchery HOR F 325 0.00 0.00 0 - 0
2017** hatchery HOR M 123 0.00 0.00 0 - 0

Table N13c: TLF per parent year, parent source, origin and sex. Parent source refers to either*** individuals trapped at the Cougar Trap and released above the dam (“Cougar Trap”, NOR and HOR) or individuals trapped at McKEnzie or Leaburg Hatcheries and released above the dam (“Hatchery”, HOR)
* Note that 2015 estimates do not include potential year 6 offspring. However we expect these offspring to contribute very little to TLF (~2%)
** Note that 2016 and 2017 offspring do not include potential year 5 and 6 offspring, and potential year 4 5 and 6 offspring, which are expected to substantially contribute to TLF for these parents years
*** Note that there are 5 individuals sampled during spawning ground surveys above the dam and 12 precocial males sampled above the dam that were included as candidate parents. None had an offspring assigned to them and they are not presented in this table

Let’s also put together a figure to make any trend easier to identify.

plot_data <- parents %>%
        filter(year < 2016, type %in% c("cougar_trap", "hatchery_outplant")) %>%
  group_by(year, type, sex) %>%
  summarise(N = n(), mean_tlf = mean(tlf), sd_tlf = sd(tlf), se_tlf = sd(tlf)/sqrt(n()), range = paste(min(tlf), " -  ", max(tlf))) %>%
  mutate(type = case_when(type == "hatchery_outplant" ~ "hatchery",
                       type == "cougar_trap" ~ "cougar trap")) %>%
  ungroup() %>%
  unite("type_sex", type, sex)

ggplot(data = plot_data, aes(x = year, y = mean_tlf, color = c(type_sex)))+geom_point(size = 2)+geom_line(size = 1.5, alpha = 0.5)+geom_errorbar(aes(ymin=mean_tlf-se_tlf, ymax=mean_tlf+se_tlf), width=.1)+scale_color_viridis_d(labels = c("Cougar Trap Female", "Cougar Trap Male", "Hatchery Female", "Hatchery Male"), name = "Source, Sex")+theme_classic()+scale_x_continuous(n.breaks = 9)+xlab("Year")+ylab("TLF")

plot_data <- parents %>%
        filter(year < 2016, !(is.na(origin))) %>%
  group_by(year, origin, sex) %>%
  summarise(N = n(), mean_tlf = mean(tlf), sd_tlf = sd(tlf), se_tlf = sd(tlf)/sqrt(n()), range = paste(min(tlf), " -  ", max(tlf))) %>%
  ungroup() %>%
  unite("origin_sex", origin, sex)

ggplot(data = plot_data, aes(x = year, y = mean_tlf, color = c(origin_sex)))+geom_point(size = 2)+geom_line(size = 1.5, alpha = 0.5)+geom_errorbar(aes(ymin=mean_tlf-se_tlf, ymax=mean_tlf+se_tlf), width=.1)+scale_color_bright(labels = c("HOR Female", "HOR Male", "NOR Female", "NOR Male"), name = "Origin, Sex")+theme_classic()+scale_x_continuous(n.breaks = 9, labels = c("2007", "2008", "2009", "2010", "2011", "2012", "2013", "2014", "2015*"))+xlab("Year")+ylab("TLF")

Figure N5: Mean TLF* and standard error by Candidate Parent Source, Sex and Year. Source refers to either Cougar Trap (includes NOR and HOR) or Hatchery (only HOR).
* Note that 2015 estimates do not include potential year 6 offspring. However we expect these offspring to contribute very little to TLF (~2%)

From the figure, some trends appear that might be more difficult for some to notice in the tables:
(1) Parents from Cougar Trap usually ahve higher TLF than parents from the hatchery.
(2) Male TLF is almost always lower than female TLF among parents outplanted from the hatchery, but not among parents from Cougar trap.
(3) Changes in fitness from year to year appear correlated between Cougar Trap and the hatchery.

Finally let’s put together a summary table of mean TLF for sexes, and origin, not divided by year to use in the text.

kable(parents %>%
        filter(year < 2016) %>%
  group_by(origin) %>%
  summarise(N = n(), mean_tlf = mean(tlf), sd_tlf = sd(tlf), range = paste(min(tlf), " -  ", max(tlf))
      ),
     align = "c", caption = "", digits = 2) %>%
  kable_classic(full_width = F, html_font = "Arial")

kable(parents %>%
        filter(year < 2016) %>%
  group_by(sex) %>%
  summarise(N = n(), mean_tlf = mean(tlf), sd_tlf = sd(tlf), range = paste(min(tlf), " -  ", max(tlf))
      ),
     align = "c", caption = "", digits = 2) %>%
  kable_classic(full_width = F, html_font = "Arial")

kable(parents %>%
        filter(year < 2016) %>%
  group_by() %>%
  summarise(N = n(), mean_tlf = mean(tlf), sd_tlf = sd(tlf), range = paste(min(tlf), " -  ", max(tlf))
      ),
     align = "c", caption = "", digits = 2) %>%
  kable_classic(full_width = F, html_font = "Arial")

Cohort Replacement Rates

Here we estimate the cohort replacement rate (CRR) across different groups of parents.

CRR is defined as the number of spawners produced by a spawner.

Results are prepared for each of the following groups:

  • All parents, by year
  • All parents, by year and sex
a <- pedigree_meta %>%
  filter(!(father_type == "none" & mother_type == "none")) %>% 
  group_by(parent_year) %>%
  summarise(offspring_n = n())

b <- dedup %>%
  filter(cand_parent == TRUE) %>%
  count(year) %>%
  rename(parent_year = year, n_candidate_parents = n)

kable(left_join(a,b) %>%
        mutate(parent_year = as.character(parent_year),
      parent_year = case_when(parent_year == 2015 ~ "2015*",
                       parent_year == 2016 ~ "2016**",
                       parent_year == 2017 ~ "2017**",
                            TRUE ~ parent_year),
      CRR = offspring_n / n_candidate_parents ), align = "c", caption = "Table N13: Cohort Replacement Rate") %>% kable_classic(full_width = F, html_font = "Arial") %>%
  kable_styling(fixed_thead = T) 
Table N13: Cohort Replacement Rate
parent_year offspring_n n_candidate_parents CRR
2007 331 745 0.4442953
2008 247 873 0.2829324
2009 114 1383 0.0824295
2010 94 747 0.1258367
2011 284 725 0.3917241
2012 171 947 0.1805702
2013 151 627 0.2408293
2014 73 652 0.1119632
2015* 69 754 0.0915119
2016** 135 705 0.1914894
rm(a)
rm(b)

Table N13: Cohort Replacement Rate (CRR) per parent year. CRR is defined as the number of spawners produced by a spawner. In our case it is the number of offspring successfully assigned to at least one parent in a given year, divided by the number of candidate parents in that year.
* Note that 2015 estimates do not include potential year 6 offspring. However we expect these offspring to contribute very little to TLF (~2%)
** Note that 2016 and 2017 offspring do not include potential year 5 and 6 offspring, and potential year 4 5 and 6 offspring, which are expected to substantially contribute to TLF for these parents years

# add offspring sex to pedigree_meta

pedigree_meta %<>%
  left_join(select(dedup, sex, sample_id), by = c("offspring_sample_id" = "sample_id")) %>%
  rename(offspring_sex = sex)

father_male_offspring_counts <- pedigree_meta %>%
  filter(offspring_sex == "M") %>%
  group_by(father) %>%
  count() %>%
  rename(parent = father)

mother_female_offspring_counts <- pedigree_meta %>%
  filter(offspring_sex == "F") %>%
  group_by(mother) %>%
  count() %>%
  rename(parent = mother)

parent_counts_same_sex <- bind_rows(mother_female_offspring_counts, father_male_offspring_counts) 
rm(father_male_offspring_counts)
rm(mother_female_offspring_counts)

parents %<>%
  left_join(parent_counts_same_sex, by = c("sample_id" = "parent")) %>%
  rename(same_sex_offspring = n) %>%
  mutate(same_sex_offspring = replace_na(same_sex_offspring, 0))

kable(parents %>%
  group_by(year, sex) %>%
    filter(year < 2018) %>%
  summarise(n = n(), crr_sex = sum(same_sex_offspring)/n(), n_offspring_same_sex = sum(same_sex_offspring)) %>%
    mutate(year = as.character(year),
      year = case_when(year == 2015 ~ "2015*",
                       year == 2016 ~ "2016**",
                       year == 2017 ~ "2017**",
                            TRUE ~ year)),
       align = "c", digits = 2, caption = "Table N14") %>%
  kable_classic(full_width = F, html_font = "Arial") %>%
  kable_styling(fixed_thead = T) %>%
  scroll_box(height = "400px")
Table N14
year sex n crr_sex n_offspring_same_sex
2007 F 318 0.36 114
2007 M 427 0.43 182
2008 F 288 0.30 85
2008 M 585 0.26 151
2009 F 603 0.06 34
2009 M 780 0.08 62
2010 F 263 0.12 31
2010 M 484 0.11 53
2011 F 320 0.34 110
2011 M 405 0.32 128
2012 F 439 0.11 49
2012 M 508 0.19 98
2013 F 325 0.14 46
2013 M 302 0.26 78
2014 F 386 0.08 31
2014 M 266 0.11 30
2015* F 465 0.06 26
2015* M 289 0.12 36
2016** F 403 0.10 39
2016** M 302 0.29 89
2017** F 370 0.00 0
2017** M 235 0.00 0
# Let's check one of these to make sure the code is working correctly. In 2016, there were 452 male outplants. These 452 potential fathers appear in the final pedigree 1092 times. Of these 1092 offspring, 723 is male. Therefore the correct CRR for 2016 male outplants is 1.5995575. This matches the table.

Table N14: Sex specific CRR by year, sex. Here CRR refers to the nuber of female offspring produced by female parents, or the number of male offspring produced by male parents. * Note that 2015 estimates do not include potential year 6 offspring. However we expect these offspring to contribute very little to TLF (~2%)
** Note that 2016 and 2017 offspring do not include potential year 5 and 6 offspring, and potential year 4 5 and 6 offspring, which are expected to substantially contribute to TLF for these parents years

Effective Number of Breeders

Calculated Nb using LD method from NeEstimator v2.1. Used GUI, so only logging the parameters used here, not calculating the values.

Parameters:
S+ option (exclude singletons)
CIs: 95% confidence intervals from jacknife re-sampling method
Data: Data for a given year is the assigned offspring of that year

Data Prep

Let’s create the input data for NeEstimator. We’ll take advantage of a wrapper function from adegenet and rLDNE to output a genepop file all assigned offspring for each parent year.

2007

# Here we filter the genotype data to get only offspring from 2007 cohort
offspring_of_2007 <- pedigree_meta %>%
  filter(mother_year == 2007 | father_year ==2007) %>%
  pull(offspring_sample_id)

gts_off_of_2007 <- dedup %>%
  filter(sample_id %in% offspring_of_2007) %>%
  select(sample_id, starts_with("Ot")) %>% #grab the genotypes
  mutate(across(.cols = everything(), ~na_if(., "0")))

#now we put bth alleles into a single column for each locus
gts_off_of_2007 %<>%
  gather(key = var, value = value, -sample_id) %>%
  mutate(var = str_extract(var, "\\d+") %>% as.numeric()) %>% 
  group_by(sample_id, var) %>%
  summarise(combined = paste(value, collapse = "")) %>% 
  spread(key = var, value = combined) 

# add a dummy pop variable for conversion, fix NAs
gts_off_of_2007 %<>%
  add_column(pop = "p") %>%
  relocate(sample_id, pop) %>%
  mutate(across(.cols = everything(), ~str_replace(., "NANA", "000000")))

write_genepop_zlr(loci = gts_off_of_2007[,3:ncol(gts_off_of_2007)],pops = gts_off_of_2007$pop,ind.ids = gts_off_of_2007$sample_id,folder = "neestimator/",filename ="genepop_2007.txt",ncode = 3,diploid = T)

2008

# Here we filter the genotype data to get only offspring from 2008 cohort
offspring_of_2008 <- pedigree_meta %>%
  filter(mother_year == 2008 | father_year ==2008) %>%
  pull(offspring_sample_id)

gts_off_of_2008 <- dedup %>%
  filter(sample_id %in% offspring_of_2008) %>%
  select(sample_id, starts_with(c("Ot"))) %>% #grab the genotypes
  mutate(across(.cols = everything(), ~na_if(., "0")))

#now we put bth alleles into a single column for each locus
gts_off_of_2008 %<>%
  gather(key = var, value = value, -sample_id) %>%
  mutate(var = str_extract(var, "\\d+") %>% as.numeric()) %>% 
  group_by(sample_id, var) %>%
  summarise(combined = paste(value, collapse = "")) %>% 
  spread(key = var, value = combined) 

# add a dummy pop variable for conversion, fix NAs
gts_off_of_2008 %<>%
  add_column(pop = "p") %>%
  relocate(sample_id, pop) %>%
  mutate(across(.cols = everything(), ~str_replace(., "NANA", "000000")))

write_genepop_zlr(loci = gts_off_of_2008[,3:ncol(gts_off_of_2008)],pops = gts_off_of_2008$pop,ind.ids = gts_off_of_2008$sample_id,folder = "neestimator/",filename ="genepop_2008.txt",ncode = 3,diploid = T)

2009

# Here we filter the genotype data to get only offspring from 2009 cohort
offspring_of_2009 <- pedigree_meta %>%
  filter(mother_year == 2009 | father_year ==2009) %>%
  pull(offspring_sample_id)

gts_off_of_2009 <- dedup %>%
  filter(sample_id %in% offspring_of_2009) %>%
  select(sample_id, starts_with(c("Ot"))) %>% #grab the genotypes
  mutate(across(.cols = everything(), ~na_if(., "0")))

#now we put bth alleles into a single column for each locus
gts_off_of_2009 %<>%
  gather(key = var, value = value, -sample_id) %>%
  mutate(var = str_extract(var, "\\d+") %>% as.numeric()) %>% 
  group_by(sample_id, var) %>%
  summarise(combined = paste(value, collapse = "")) %>% 
  spread(key = var, value = combined) 

# add a dummy pop variable for conversion, fix NAs
gts_off_of_2009 %<>%
  add_column(pop = "p") %>%
  relocate(sample_id, pop) %>%
  mutate(across(.cols = everything(), ~str_replace(., "NANA", "000000")))

write_genepop_zlr(loci = gts_off_of_2009[,3:ncol(gts_off_of_2009)],pops = gts_off_of_2009$pop,ind.ids = gts_off_of_2009$sample_id,folder = "neestimator/",filename ="genepop_2009.txt",ncode = 3,diploid = T)

2010

# Here we filter the genotype data to get only offspring from 2010 cohort
offspring_of_2010 <- pedigree_meta %>%
  filter(mother_year == 2010 | father_year ==2010) %>%
  pull(offspring_sample_id)

gts_off_of_2010 <- dedup %>%
  filter(sample_id %in% offspring_of_2010) %>%
  select(sample_id, starts_with(c("Ot"))) %>% #grab the genotypes
  mutate(across(.cols = everything(), ~na_if(., "0")))

#now we put bth alleles into a single column for each locus
gts_off_of_2010 %<>%
  gather(key = var, value = value, -sample_id) %>%
  mutate(var = str_extract(var, "\\d+") %>% as.numeric()) %>% 
  group_by(sample_id, var) %>%
  summarise(combined = paste(value, collapse = "")) %>% 
  spread(key = var, value = combined) 

# add a dummy pop variable for conversion, fix NAs
gts_off_of_2010 %<>%
  add_column(pop = "p") %>%
  relocate(sample_id, pop) %>%
  mutate(across(.cols = everything(), ~str_replace(., "NANA", "000000")))

write_genepop_zlr(loci = gts_off_of_2010[,3:ncol(gts_off_of_2010)],pops = gts_off_of_2010$pop,ind.ids = gts_off_of_2010$sample_id,folder = "neestimator/",filename ="genepop_2010.txt",ncode = 3,diploid = T)

2011

# Here we filter the genotype data to get only offspring from 2011 cohort
offspring_of_2011 <- pedigree_meta %>%
  filter(mother_year == 2011 | father_year ==2011) %>%
  pull(offspring_sample_id)

gts_off_of_2011 <- dedup %>%
  filter(sample_id %in% offspring_of_2011) %>%
  select(sample_id, starts_with(c("Ot"))) %>% #grab the genotypes
  mutate(across(.cols = everything(), ~na_if(., "0")))

#now we put bth alleles into a single column for each locus
gts_off_of_2011 %<>%
  gather(key = var, value = value, -sample_id) %>%
  mutate(var = str_extract(var, "\\d+") %>% as.numeric()) %>% 
  group_by(sample_id, var) %>%
  summarise(combined = paste(value, collapse = "")) %>% 
  spread(key = var, value = combined) 

# add a dummy pop variable for conversion, fix NAs
gts_off_of_2011 %<>%
  add_column(pop = "p") %>%
  relocate(sample_id, pop) %>%
  mutate(across(.cols = everything(), ~str_replace(., "NANA", "000000")))

write_genepop_zlr(loci = gts_off_of_2011[,3:ncol(gts_off_of_2011)],pops = gts_off_of_2011$pop,ind.ids = gts_off_of_2011$sample_id,folder = "neestimator/",filename ="genepop_2011.txt",ncode = 3,diploid = T)

2012

# Here we filter the genotype data to get only offspring from 2012 cohort
offspring_of_2012 <- pedigree_meta %>%
  filter(mother_year == 2012 | father_year ==2012) %>%
  pull(offspring_sample_id)

gts_off_of_2012 <- dedup %>%
  filter(sample_id %in% offspring_of_2012) %>%
  select(sample_id, starts_with(c("Ot"))) %>% #grab the genotypes
  mutate(across(.cols = everything(), ~na_if(., "0")))

#now we put bth alleles into a single column for each locus
gts_off_of_2012 %<>%
  gather(key = var, value = value, -sample_id) %>%
  mutate(var = str_extract(var, "\\d+") %>% as.numeric()) %>% 
  group_by(sample_id, var) %>%
  summarise(combined = paste(value, collapse = "")) %>% 
  spread(key = var, value = combined) 

# add a dummy pop variable for conversion, fix NAs
gts_off_of_2012 %<>%
  add_column(pop = "p") %>%
  relocate(sample_id, pop) %>%
  mutate(across(.cols = everything(), ~str_replace(., "NANA", "000000")))

write_genepop_zlr(loci = gts_off_of_2012[,3:ncol(gts_off_of_2012)],pops = gts_off_of_2012$pop,ind.ids = gts_off_of_2012$sample_id,folder = "neestimator/",filename ="genepop_2012.txt",ncode = 3,diploid = T)

2013

# Here we filter the genotype data to get only offspring from 2013 cohort
offspring_of_2013 <- pedigree_meta %>%
  filter(mother_year == 2013 | father_year ==2013) %>%
  pull(offspring_sample_id)

gts_off_of_2013 <- dedup %>%
  filter(sample_id %in% offspring_of_2013) %>%
  select(sample_id, starts_with(c("Ot"))) %>% #grab the genotypes
  mutate(across(.cols = everything(), ~na_if(., "0")))

#now we put bth alleles into a single column for each locus
gts_off_of_2013 %<>%
  gather(key = var, value = value, -sample_id) %>%
  mutate(var = str_extract(var, "\\d+") %>% as.numeric()) %>% 
  group_by(sample_id, var) %>%
  summarise(combined = paste(value, collapse = "")) %>% 
  spread(key = var, value = combined) 

# add a dummy pop variable for conversion, fix NAs
gts_off_of_2013 %<>%
  add_column(pop = "p") %>%
  relocate(sample_id, pop) %>%
  mutate(across(.cols = everything(), ~str_replace(., "NANA", "000000")))

write_genepop_zlr(loci = gts_off_of_2013[,3:ncol(gts_off_of_2013)],pops = gts_off_of_2013$pop,ind.ids = gts_off_of_2013$sample_id,folder = "neestimator/",filename ="genepop_2013.txt",ncode = 3,diploid = T)

2014

# Here we filter the genotype data to get only offspring from 2014 cohort
offspring_of_2014 <- pedigree_meta %>%
  filter(mother_year == 2014 | father_year ==2014) %>%
  pull(offspring_sample_id)

gts_off_of_2014 <- dedup %>%
  filter(sample_id %in% offspring_of_2014) %>%
  select(sample_id, starts_with(c("Ot"))) %>% #grab the genotypes
  mutate(across(.cols = everything(), ~na_if(., "0")))

#now we put bth alleles into a single column for each locus
gts_off_of_2014 %<>%
  gather(key = var, value = value, -sample_id) %>%
  mutate(var = str_extract(var, "\\d+") %>% as.numeric()) %>% 
  group_by(sample_id, var) %>%
  summarise(combined = paste(value, collapse = "")) %>% 
  spread(key = var, value = combined) 

# add a dummy pop variable for conversion, fix NAs
gts_off_of_2014 %<>%
  add_column(pop = "p") %>%
  relocate(sample_id, pop) %>%
  mutate(across(.cols = everything(), ~str_replace(., "NANA", "000000")))

write_genepop_zlr(loci = gts_off_of_2014[,3:ncol(gts_off_of_2014)],pops = gts_off_of_2014$pop,ind.ids = gts_off_of_2014$sample_id,folder = "neestimator/",filename ="genepop_2014.txt",ncode = 3,diploid = T)

2015

# Here we filter the genotype data to get only offspring from 2015 cohort
offspring_of_2015 <- pedigree_meta %>%
  filter(mother_year == 2015 | father_year ==2015) %>%
  pull(offspring_sample_id)

gts_off_of_2015 <- dedup %>%
  filter(sample_id %in% offspring_of_2015) %>%
  select(sample_id, starts_with(c("Ot"))) %>% #grab the genotypes
  mutate(across(.cols = everything(), ~na_if(., "0")))

#now we put bth alleles into a single column for each locus
gts_off_of_2015 %<>%
  gather(key = var, value = value, -sample_id) %>%
  mutate(var = str_extract(var, "\\d+") %>% as.numeric()) %>% 
  group_by(sample_id, var) %>%
  summarise(combined = paste(value, collapse = "")) %>% 
  spread(key = var, value = combined) 

# add a dummy pop variable for conversion, fix NAs
gts_off_of_2015 %<>%
  add_column(pop = "p") %>%
  relocate(sample_id, pop) %>%
  mutate(across(.cols = everything(), ~str_replace(., "NANA", "000000")))

write_genepop_zlr(loci = gts_off_of_2015[,3:ncol(gts_off_of_2015)],pops = gts_off_of_2015$pop,ind.ids = gts_off_of_2015$sample_id,folder = "neestimator/",filename ="genepop_2015.txt",ncode = 3,diploid = T)

Results

Neestimator results are not formatted to be easily parsed by machine. Instead, wrote results manually to a spreadsheet.

ne <- read_tsv("neestimator/ne_estimator_results.txt")

# let's build a table by parent year of number of candidate parents, number of successful parents, and number of assigned offspring
a <- parents %>%
  filter(year <2016) %>%
  group_by(year) %>%
  summarise(n_cand = n(), n_successful = sum(tlf >0))

b <- pedigree_meta %>%
  count(parent_year) %>%
  filter(parent_year < 2016) %>%
  rename(year = parent_year)

nb_table <- left_join(a,b) %>%
  rename(n_offspring = n)
  
#left_join(nb_table, ne) #just to check the offspring datasets were correct, looks good!

kable(nb_table %<>%
  left_join(ne) %>%
  select(-N_offspring, Nb = Ne) %>%
    mutate(year = as.character(year),
           year = case_when(year == "2015" ~ "2015*",
                            TRUE ~ year),
           Nb_Nsuccess_ratio = Nb/n_successful),
       align = "c", digits = 2, caption = "Table N15: Effective Number of Breeders") %>%
  kable_classic(full_width = F, html_font = "Arial") 
Table N15: Effective Number of Breeders
year n_cand n_successful n_offspring Nb CI_lower CI_upper Nb_Nsuccess_ratio
2007 745 261 331 265.5 222.7 322.8 1.02
2008 873 229 247 247.5 203.7 308.8 1.08
2009 1383 156 114 368.6 228.1 848.6 2.36
2010 747 105 94 169.8 116.6 288.0 1.62
2011 725 209 284 220.1 181.7 272.7 1.05
2012 947 206 171 297.2 216.5 451.4 1.44
2013 627 152 151 139.8 104.3 199.7 0.92
2014 652 90 73 167.5 104.7 361.6 1.86
2015* 754 103 69 211.9 122.1 627.5 2.06

Table N15 Effective number of breeders (Nb) per parent year as estimated by NeEstimator. Number of candidate parents (N_cand) is the number of salmon released above the Cougar Dam in a given year that were sampled, and successfully genotyped. N_successful is the number of candidate parents with one or more offspring in the pedigree. N_offspring is the nmber of offspring assigned to candidate parents released above the dam that parent year. 95% Confidence intervals based on jack-knife are provided. Finally the Nb_Nsuccess ratio is the Nb estimate divided by the number of successful breeders.

Predictors of Fitness

In this section we fit a GLMM on TLF of salmon released above the dam.

Mixed Modeling Approach Overview

We primarily follow the approach described in Zuur et al 2009. Mixed Effects Models and Extensions in Ecology with R.

  1. Define Predictors BEfore any analysis, we need to think about what predictors are in the dataset or can be extracted from it and how they may interact.
  2. Exploratory Data Analysis: First we explore the relationships between all predictors in the dataset.
  3. Multicollinearity: In addition to our findings from EDA above, we fit main effects models and calculate variance inflation factors to identify multicollinearity among main effects.
  4. Random Effects Model Selection: After removing effects contributing to multicollinearity in a main effects only model, we fit an otherwise saturated models including all main effects and interactions, but varied which random effects were included. Model fit was by REML. We chose the best random effects structure by AIC and LRTs.
  5. Fixed Effects Model Selection: We then conducted model selection on the fixed effects using LRTs and backward stepwise selection on Wald tests for signficant effects (p < 0.05 not multiple comparison corrected). Model fit was by ML. For the LRTs we stepwise dropped non-significant interaction terms from the model, to evaluate if main effects were significant.
  6. Model Validation: We validated the model using simulated residuals from the DHARMa package.
  7. Estimated Marginal Means and Effect: To improve biological intepretation of signficant predictors of fitness, we estimated marginal means for significant effects.

Predictors

There are a lot of potential predictors and interactions. Here we describe each and rationalize which to consider interactions between.

  1. Release Day: What day were individuals released? We will transform date into Julian Day. Here we can expect an optimum so we may want to consider a non-linear effect.
  2. Release Location: There are several release locations at various points above the dam from the head of the reservoir all the way to 37km upriver. We can consider release location as a factor or rkm as a continuous variable. We will consider both and only retain one.
  3. Origin: What is the origin of the individual, NOR or HOR?
  1. Origin of Parents: Since our pedigree includes so many years we can also consider (for NOR individuals) whether an individuals descends from a HORxHOR, HORxNOR, or NORxNOR cross. This one is a bit complicated though, and will drastically reduce sample size. Since our mixed modeling tools are not tolerant of missing data will will not inclde this, but will return to it in a post-hoc analysis.
  2. Origin of Grandparents: Same as above but for grandparents. Unlikely to include since we have so few F2s with fitness estimates in the dataset.
  1. Source: Was the individual trucked from the Cougar Trap of the Hatchery. This is expected to be strongly collinear with origin, so it is unlikely to be retained in the model, but there are 218 cougar trap HORs that can be compared to hatchery outplant HORs.
  2. Release Group Variables: These are variables nested within release group, the set of fish released together from the same source on the same day at the same location.
  1. Release group: Used as a random effect, the release group itself. Fitting as a random effect allows us to beneift from shrinkage to improve estimates at fixed effects, and we can also gain an understanding of how inter-release group variance contributes to variance in fitness overall.
  2. Release Group Sex Ratio: What is the sex ratio of fish in a release group? If individuals from a single release group are more likely to spawn together than individuals from different release group, sex ratio might matter. Fit as non-linear effect as there is expected to be an optimum.
  3. Release Group Density: What is the density of a release group? Is there an optimum number to release at one time? Fit as non-linear effect.
  1. Annual Variables: These are variables nested within year.
  1. Year: Random effect of year.
  2. Annual Density: Is there an optimum density for releases? Fit as non-linear effect.
  3. Annual Sex Ratio: This was the single most important predictor in the North Santiam. Fit as non-linear effect as there is expected to be an optimum.
  1. Size: We have an estimate of size at maturity for about half (4321/8985) of our candidate parents. Our modeling tools are not compatible with missing data, so this would effectively cut our sample size in half. We chose not to fit it, but will consider it in a post hoc analysis. For example, it is known that origin effects on fitness are mediated at least somewhat through size, so if origin has an effect we should examine if there is a size interaction.

Interactions
* Sex * Sex Ratios: This one seems obvious
* Sex * Release Day: This one follows from ecology of salmon spawning. Selective pressures on males and females differ over time during spawning.
* Sex * Origin: Previous work in this system found this interaction. We should follow up here.

There are other interactions that might be interesting (e.g sex* release group density), but each one costs a lot of degrees of freedom, especially for release day, which might be fit as a non-linear effect. Let’s ignore all other interactions.

Data, EDA and Multicollinearity

Here we prepare the dataset for modeling and conduct some exploratory data analysis.

Our dataset includes all candidate parents (i.e. released above the dam) from 2007 to 2015. Note that 2015 TLF does not include age 6 offsrping in their TLFs, but our AAM analysis suggests year 6 offspring contribute little to TLF (~2%).

mm_data <-  parents %>%
  filter( year <2016) %>%
  select(date, sex, release, rkm, origin, year, type, tlf) %>%
  drop_na() %>%
  mutate(jday = as.numeric(format(date, "%j"))) %>% #julian day in this case: days since the first day of the year
  mutate(jday_c = scale(jday, scale = F), #center the julian day to help with convergence
         sex = as.factor(sex),
         release= as.factor(release),
         year = as.factor(year),
         group = as.factor(paste(date, release, type)))

# The dataset include 7438 individuals

# lets add density
dens <- mm_data %>%
 group_by(jday, release, year, type) %>%
  summarise(density = n())
## `summarise()` has grouped output by 'jday', 'release', 'year'. You can override
## using the `.groups` argument.
mm_data %<>%
  left_join(dens)
## Joining, by = c("release", "year", "type", "jday")
# lets add overall size of release in a year
dens <- mm_data %>%
 group_by(year) %>%
  summarise(annual_n = n())

mm_data %<>%
  left_join(dens)
## Joining, by = "year"
# lets add sex ratio
#build release group sex ratio variable
f <- mm_data %>% 
  filter(sex == "F") %>%
  group_by(release, jday, year, type) %>% 
  summarise(n_female_rg = n()) 
## `summarise()` has grouped output by 'release', 'jday', 'year'. You can override
## using the `.groups` argument.
mm_data %<>%
  left_join(f)
## Joining, by = c("release", "year", "type", "jday")
m <- mm_data %>% 
  filter(sex == "M") %>%
  group_by(release, jday, year, type) %>% 
  summarise(n_male_rg = n()) 
## `summarise()` has grouped output by 'release', 'jday', 'year'. You can override
## using the `.groups` argument.
mm_data %<>%
  left_join(m) %>%
  mutate(sex_ratio_rg = n_male_rg/n_female_rg)
## Joining, by = c("release", "year", "type", "jday")
# maybe the release groups all mix thoroughly spawn together and we should fit sex ratio at the level of year
f <- mm_data %>% 
  filter(sex == "F") %>%
  group_by( year) %>% 
  summarise(n_female_y = n()) 

mm_data %<>%
  left_join(f)
## Joining, by = "year"
m <- mm_data %>% 
  filter(sex == "M") %>%
  group_by( year) %>% 
  summarise(n_male_y = n()) 

mm_data %<>%
  left_join(m) %>%
  mutate(sex_ratio_y = n_male_y/n_female_y)
## Joining, by = "year"
mm_data %<>%
  mutate(sex_ratio_rg_l = log(sex_ratio_rg),
         sex_ratio_y_l = log(sex_ratio_y))

# there are a bunch of days where the only males or only females are released, this means we wind up with NAs when trying to log transform release group level variable, reducing sample size by 375 individuals. If we don't use that variable (release group sex ratio), then we can get rid of this dataset and use one with the full set of individuals, let's remember this

mm_data %<>%
  drop_na()

# oops forgot to convert origin to a factor
mm_data %<>%
  mutate(origin = as.factor(origin))

Let’s explore the data a bit too and look for issues with collinearity. We will assume that fitness in Mckenzie is best fit with a negative binomial, just like in North Santiam, but we’ll explore the possibility of using other distributions as well (poisson, hurdle and zero inflation).

##################################################################
##################################################################
#Here are some functions that we took from the pairs help file and
#modified, or wrote ourselves. To cite these, use the r citation: citation()

panel.cor <- function(x, y, digits=1, prefix="", cex.cor = 6)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(0, 1, 0, 1))
  r1=cor(x,y,use="pairwise.complete.obs")
  r <- abs(cor(x, y,use="pairwise.complete.obs"))
  txt <- format(c(r1, 0.123456789), digits=digits)[1]
  txt <- paste(prefix, txt, sep="")
  if(missing(cex.cor)) { cex <- 0.9/strwidth(txt) } else {
     cex = cex.cor}
  text(0.5, 0.5, txt, cex = cex * r)
}

##################################################################
panel.smooth2=function (x, y, col = par("col"), bg = NA, pch = par("pch"),
                        cex = 1, col.smooth = "black", span = 2/3, iter = 3, ...)
{
  points(x, y, pch = pch, col = col, bg = bg, cex = cex)
  ok <- is.finite(x) & is.finite(y)
  if (any(ok))
    lines(stats::lowess(x[ok], y[ok], f = span, iter = iter),
          col = 1, ...)
}

##################################################################
panel.lines2=function (x, y, col = par("col"), bg = NA, pch = par("pch"),
                       cex = 1, ...)
{
  points(x, y, pch = pch, col = col, bg = bg, cex = cex)
  ok <- is.finite(x) & is.finite(y)
  if (any(ok)){
    tmp=lm(y[ok]~x[ok])
    abline(tmp)}
}

##################################################################
panel.hist <- function(x, ...)
{
  usr <- par("usr"); on.exit(par(usr))
  par(usr = c(usr[1:2], 0, 1.5) )
  h <- hist(x, plot = FALSE)
  breaks <- h$breaks; nB <- length(breaks)
  y <- h$counts; y <- y/max(y)
  rect(breaks[-nB], 0, breaks[-1], y, col="white", ...)
}
select(mm_data, tlf, sex, jday, density,sex_ratio_y_l, sex_ratio_rg_l, annual_n , release, origin) %>%
  pairs(., lower.panel = panel.cor, diag.panel = panel.hist, upper.panel = panel.smooth2)

mm_data %>%
  select(tlf, sex, jday, density,sex_ratio_y_l, sex_ratio_rg_l, annual_n , release, origin) %>%
  mutate(tlf = log(tlf)) %>%
  pairs(., lower.panel = panel.cor, diag.panel = panel.hist, upper.panel = panel.smooth2)

The bi-plots are revealing some issues with balance and collinearity. We will explore mutlicollinearity with VIFs, but wrt balance, it looks like annual density (total number of individuals released in a year), might be a problem. There is one year where many more individuals were released than others. This might cause an issue fitting models. Let’s exclude it from the modeling.

Let’s fit a model of fixed effects and look for collinearity among predictors

beyond_opt_main <- glm.nb(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) +  origin + release , data = mm_data)
kable(vif(beyond_opt_main), caption = "Variance Inflation Factors for fully saturated fixed effect model") %>% kable_classic(full_width = F, html_font = "Cambria" ) 
Variance Inflation Factors for fully saturated fixed effect model
GVIF Df GVIF^(1/(2*Df))
poly(jday_c, 2) 1.582574 2 1.121608
sex 1.126921 1 1.061565
poly(density, 2) 3.075282 2 1.324254
poly(sex_ratio_y_l, 2) 2.508704 2 1.258527
poly(sex_ratio_rg_l, 2) 1.711704 2 1.143819
origin 2.504500 1 1.582561
release 3.197109 4 1.156364

Unlike the North Santiam dataset, it looks like releases were structured in such a way that there is limited multicollinearity in the dataset! The worst offender is origin, with a DF corrected GVIF of 1.58. This is barely above the value of 1.5 applied by the most conservative modelers, and well below the more typical cutoffs of 10 or 5. From the biplots, we can assume much of the VIF comes from collinearity between origin and release location, release group density, or all three combined. NORs are only released at two sites and HORs are released at 5. NORs are also released at lower densities than HORs.

To check that this is the source of the multicollinearity, let’s build a model without release group density and without release location.

beyond_opt_main <- glm.nb(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) + origin, data = mm_data)
kable(vif(beyond_opt_main), caption = "Variance Inflation Factors for model without release location") %>% kable_classic(full_width = F, html_font = "Cambria" ) 
Variance Inflation Factors for model without release location
GVIF Df GVIF^(1/(2*Df))
poly(jday_c, 2) 1.287246 2 1.065161
sex 1.125568 1 1.060928
poly(density, 2) 2.680494 2 1.279540
poly(sex_ratio_y_l, 2) 1.911502 2 1.175828
poly(sex_ratio_rg_l, 2) 1.587541 2 1.122487
origin 1.877044 1 1.370053
beyond_opt_main <- glm.nb(tlf ~ poly(jday_c,2) + sex  + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2)  + origin + release, data = mm_data)
kable(vif(beyond_opt_main), caption = "Variance Inflation Factors for model without release group density") %>% kable_classic(full_width = F, html_font = "Cambria" ) 
Variance Inflation Factors for model without release group density
GVIF Df GVIF^(1/(2*Df))
poly(jday_c, 2) 1.543153 2 1.114557
sex 1.127619 1 1.061894
poly(sex_ratio_y_l, 2) 1.934718 2 1.179382
poly(sex_ratio_rg_l, 2) 1.639354 2 1.131536
origin 1.863935 1 1.365260
release 2.861730 4 1.140456

Yes, removing either release group density or release location reduces GVIFs.

Distribution

We have assumed so far that the negative binomial is the correct distribution to model fitness. This follows previous work by our lab and even major review papers, so it is clear negative binomial is superior to poisson and quasipoisson. However, there are extensions to negative binomial to deal with zero-inflation including ZINB and hurdle models and we should always check that a simpler model isn’t better.

We will fit fully saturated fixed effect model with interactions (no random effects) under poisson, quasipoisson, negbin, zero inflated negbin and hurdle negbin models, conduct a simple model selection procedure (only remove predictors that don’t pass an LRT at p < 0.05) and compare fits of the optimized model.

Poisson

Let’s start with Poisson.

pois <- glm(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) + origin + sex*origin + sex*poly(sex_ratio_y_l, 2)+ sex*poly(jday_c,2), data = mm_data, family = "poisson")
drop1(pois, test = "Chisq")

Dropping sex*density(non-linear) and refitting.

pois <- glm(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) + origin + sex*origin + sex*poly(sex_ratio_y_l, 2) + sex*density, data = mm_data, family = "poisson")
drop1(pois, test = "Chisq")

Dropping sex*density and refitting.

pois <- glm(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) + origin + sex*origin + sex*poly(sex_ratio_y_l, 2) , data = mm_data, family = "poisson")
drop1(pois, test = "Chisq")

Final Poisson model includes: Julian Day (Non-linear), Release Group Density (Non-Linear), Annual Sex Ratio (Non-Linear), Sex, Origin, and the interaction between Sex and Origin and Sex and Annual Sex Ratio.

Let’s look at the model fit

summary(pois)
## 
## Call:
## glm(formula = tlf ~ poly(jday_c, 2) + sex + poly(density, 2) + 
##     poly(sex_ratio_y_l, 2) + poly(sex_ratio_rg_l, 2) + origin + 
##     sex * origin + sex * poly(sex_ratio_y_l, 2), family = "poisson", 
##     data = mm_data)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.3535  -0.8944  -0.7433  -0.5289  10.0489  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -1.04372    0.03489 -29.915  < 2e-16 ***
## poly(jday_c, 2)1              -3.15056    1.91575  -1.645 0.100060    
## poly(jday_c, 2)2              -9.66823    1.79321  -5.392 6.98e-08 ***
## sexM                          -0.29841    0.05159  -5.784 7.30e-09 ***
## poly(density, 2)1            -19.55124    2.66376  -7.340 2.14e-13 ***
## poly(density, 2)2             -8.14637    2.49248  -3.268 0.001082 ** 
## poly(sex_ratio_y_l, 2)1       32.97472    3.13559  10.516  < 2e-16 ***
## poly(sex_ratio_y_l, 2)2       -5.85338    2.65529  -2.204 0.027495 *  
## poly(sex_ratio_rg_l, 2)1      -9.02070    2.17035  -4.156 3.23e-05 ***
## poly(sex_ratio_rg_l, 2)2      -4.40190    1.96852  -2.236 0.025342 *  
## originNOR                      0.20020    0.08240   2.430 0.015109 *  
## sexM:originNOR                 0.35681    0.09384   3.802 0.000143 ***
## sexM:poly(sex_ratio_y_l, 2)1 -22.71332    4.11012  -5.526 3.27e-08 ***
## sexM:poly(sex_ratio_y_l, 2)2   2.81678    3.65475   0.771 0.440874    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 9145.3  on 7062  degrees of freedom
## Residual deviance: 8646.8  on 7049  degrees of freedom
## AIC: 12028
## 
## Number of Fisher Scoring iterations: 6
simulationOutput <- simulateResiduals(fittedModel = pois, plot = F)
plot(simulationOutput)

testZeroInflation(simulationOutput)

## 
##  DHARMa zero-inflation test via comparison to expected zeros with
##  simulation under H0 = fitted model
## 
## data:  simulationOutput
## ratioObsSim = 1.1251, p-value < 2.2e-16
## alternative hypothesis: two.sided

Conclusion
Model fit under poisson is very poor. Overdispersed, zero-inflated and a strong relationship between residuals and predictors. The model is struggling between fitting all the zeros and fitting the tlf > 1. We should not expect reliable parameter estimates or p-values out of a Poisson fit.

NegBin

Let’s see if estimating variance and mean separately (e.g. negative binomial) fixes the overdispersion.

glm_nb <- glm.nb(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) + origin + sex*origin + sex*poly(sex_ratio_y_l, 2)+ sex*poly(jday_c,2), data = mm_data)
drop1(glm_nb, test = "Chisq")

Drop the sex* non-linear julian day interaction

glm_nb <- glm.nb(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2)+ origin + sex*origin + sex*poly(sex_ratio_y_l, 2)+ sex*jday_c, data = mm_data)
drop1(glm_nb, test = "Chisq")

Drop the sex* julian day interaction

glm_nb <- glm.nb(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) +  origin + sex*origin + sex*poly(sex_ratio_y_l, 2), data = mm_data)
drop1(glm_nb, test = "Chisq")

Drop sex*origin effect

glm_nb <- glm.nb(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) +  origin  + sex*poly(sex_ratio_y_l, 2), data = mm_data)
drop1(glm_nb, test = "Chisq")

Drop non-linear sex ratio of release group

glm_nb <- glm.nb(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + sex_ratio_rg_l +  origin  + sex*poly(sex_ratio_y_l, 2), data = mm_data)
drop1(glm_nb, test = "Chisq")

Final model is sex, release day (non-linear), release group density and sex ratio, origin and the interaction between sex and annual sex ratio.

Let’s evaluate model fit

summary(glm_nb)
## 
## Call:
## glm.nb(formula = tlf ~ poly(jday_c, 2) + sex + poly(density, 
##     2) + poly(sex_ratio_y_l, 2) + sex_ratio_rg_l + origin + sex * 
##     poly(sex_ratio_y_l, 2), data = mm_data, init.theta = 0.3062411142, 
##     link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9715  -0.7136  -0.6329  -0.4842   4.5046  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -1.07300    0.04816 -22.278  < 2e-16 ***
## poly(jday_c, 2)1              -1.96030    2.73844  -0.716 0.474085    
## poly(jday_c, 2)2             -10.80892    2.66116  -4.062 4.87e-05 ***
## sexM                          -0.21717    0.06413  -3.387 0.000708 ***
## poly(density, 2)1            -16.14074    3.58944  -4.497 6.90e-06 ***
## poly(density, 2)2             -6.29638    3.47900  -1.810 0.070324 .  
## poly(sex_ratio_y_l, 2)1       35.81521    4.37166   8.193 2.56e-16 ***
## poly(sex_ratio_y_l, 2)2       -4.99858    3.93373  -1.271 0.203836    
## sex_ratio_rg_l                -0.07779    0.03653  -2.129 0.033219 *  
## originNOR                      0.43704    0.09649   4.529 5.91e-06 ***
## sexM:poly(sex_ratio_y_l, 2)1 -26.95613    5.58222  -4.829 1.37e-06 ***
## sexM:poly(sex_ratio_y_l, 2)2   2.16107    5.19812   0.416 0.677600    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.3062) family taken to be 1)
## 
##     Null deviance: 4123.2  on 7062  degrees of freedom
## Residual deviance: 3895.1  on 7051  degrees of freedom
## AIC: 10334
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.3062 
##           Std. Err.:  0.0162 
## 
##  2 x log-likelihood:  -10307.5490
simulationOutput <- simulateResiduals(fittedModel = glm_nb, plot = F)
plot(simulationOutput)

testZeroInflation(simulationOutput)

## 
##  DHARMa zero-inflation test via comparison to expected zeros with
##  simulation under H0 = fitted model
## 
## data:  simulationOutput
## ratioObsSim = 1.0002, p-value = 0.952
## alternative hypothesis: two.sided

Conclusion
This is a good fit to the data. We could probablit do even better with random effects, but this demosntrates at least that the negbin fixes the problems of the poisson. There is no evidence of any model fit problems, nor a zero inflation problem. However, let’s look at zero-inflation models anyway.

ZINB

Here we fit a zero inflated negative binomial. Here we model zero TLF and non-zero TLF separately. For the zero part of the model we do not include any predictors. We can’t use the drop1 function to do model selection before, so we make (the rather big) assumption that model selection would have followed the same results as the negbin model and fit the final model after model selection using that distribution

zinb <- zeroinfl(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + sex_ratio_rg_l +  origin  + sex*poly(sex_ratio_y_l, 2) | 1, data = mm_data, dist = "negbin" )
summary(zinb)
## 
## Call:
## zeroinfl(formula = tlf ~ poly(jday_c, 2) + sex + poly(density, 2) + poly(sex_ratio_y_l, 
##     2) + sex_ratio_rg_l + origin + sex * poly(sex_ratio_y_l, 2) | 1, 
##     data = mm_data, dist = "negbin")
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4904 -0.4158 -0.3838 -0.3128 20.6884 
## 
## Count model coefficients (negbin with log link):
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -1.06918    0.05814 -18.389  < 2e-16 ***
## poly(jday_c, 2)1              -2.09297    2.70431  -0.774 0.438967    
## poly(jday_c, 2)2             -10.74675    2.74236  -3.919 8.90e-05 ***
## sexM                          -0.21754    0.06458  -3.369 0.000756 ***
## poly(density, 2)1            -16.28671    3.66348  -4.446 8.76e-06 ***
## poly(density, 2)2             -6.42102    3.64616  -1.761 0.078232 .  
## poly(sex_ratio_y_l, 2)1       35.46054    4.40165   8.056 7.87e-16 ***
## poly(sex_ratio_y_l, 2)2       -4.84028    3.88376  -1.246 0.212659    
## sex_ratio_rg_l                -0.07702    0.03796  -2.029 0.042430 *  
## originNOR                      0.43574    0.10343   4.213 2.52e-05 ***
## sexM:poly(sex_ratio_y_l, 2)1 -26.77708    5.62258  -4.762 1.91e-06 ***
## sexM:poly(sex_ratio_y_l, 2)2   1.99088    5.16378   0.386 0.699833    
## Log(theta)                    -1.17855    0.06974 -16.898  < 2e-16 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)   -5.696      9.580  -0.595    0.552
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 0.3077 
## Number of iterations in BFGS optimization: 59 
## Log-likelihood: -5154 on 14 Df

Comparison

Let’s compare models. First let’s use a rootogram.

rootogram(pois, main = "Poisson")

rootogram(glm_nb, main = "Negative Binomial")

rootogram(zinb, main = "Zero-Inflated Negative Binomial")

Now let’s look at AIC and BIC

AIC(pois, glm_nb, zinb)
BIC(pois, glm_nb, zinb)

Finally, let’s conduct some likelihood ratio tests on nested models.

lmtest::lrtest(glm_nb, zinb )
## Warning in modelUpdate(objects[[i - 1]], objects[[i]]): original model was of
## class "negbin", updated model is of class "zeroinfl"

Conclusion

The model fits above overwhelmingly support a Negative Binomial over the Zero-Inflated or Poisson models. The Poisson is a very poor fit to the data. Zero-inflated negative binomial and negative binomial provide equaivalently good fits, so we should choose the one with fewer parameters.

Model Selection

Random Effects

Now lets fit the beyond optimal model. This has more fixed effects than we could possibly want in our final model, but that is desirable for finding the optimum random effect structure. We should also fit all fixed effects that could reasonably be non-linear using polynomials, as the linear model is nested within the polynomial and therefore the polynomial is better as the satured/beyond optimal.

mm_beyond_opt <- glmmTMB(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) + origin + release + sex*origin + sex*poly(sex_ratio_y_l, 2)+ sex*poly(jday_c,2) + (1|group) + (1|year), data = mm_data, family = nbinom2, REML = TRUE)

mm_beyond_opt2 <- glmmTMB(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) + origin + release + sex*origin + sex*poly(sex_ratio_y_l, 2)+ sex*poly(jday_c,2) + (1|year), data = mm_data, family = nbinom2, REML = TRUE)

mm_beyond_opt3 <- glmmTMB(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) + origin + release + sex*origin + sex*poly(sex_ratio_y_l, 2)+ sex*poly(jday_c,2) + (1|group) , data = mm_data, family = nbinom2, REML = TRUE)

mm_beyond_opt4 <- glmmTMB(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) + origin + release + sex*origin + sex*poly(sex_ratio_y_l, 2)+ sex*poly(jday_c,2) , data = mm_data, family = nbinom2, REML = TRUE)



AIC(mm_beyond_opt, mm_beyond_opt2, mm_beyond_opt3, mm_beyond_opt4)
BIC(mm_beyond_opt, mm_beyond_opt2, mm_beyond_opt3, mm_beyond_opt4)

lmtest::lrtest(mm_beyond_opt4, mm_beyond_opt)
lmtest::lrtest(mm_beyond_opt4, mm_beyond_opt2)
lmtest::lrtest(mm_beyond_opt4, mm_beyond_opt3)
summary(mm_beyond_opt)

AIC and BIC and LRTs agree the best fit to the data is the full random effects model including both release group and year.

Ultimately the correct random effects structure is a question about the inferences we’d like to make and AIC is only one piece of information to help us understand the releaitonship between parsimony and model fit.

For example, so many of our variables are calculated at the level of release group, and including release group as a random effect means these variables will be competing with random intercepts for release group. Same goes for year.

Since we are interested in evaluating the significance of predictors primarily, I’d prefer to be conservative and include them as random effectes. Failing to include year and release group as random effects implies that we are uninterested in the correlation within groups. Inferentially, this implies that the fixed effects at the level of release group or year (release group: sex ratio, density and year: n and sex ratio) are the only effects that might lead to correlation within the levels, and we freely pool across all levels of release group or year and attribute all variance to fixed effects.

We will include random effects for year and release group.

Fixed Effects

Now let’s turn to the otherside of the model and conduct model selection on the fixed effects.

We will conduct backwards stepwsie selection using Wald Tests and LRT, separately.

Non-Linear Effects

We have a lot of potential non-linear effects. This will make a typical backwards stepwise model selection procedure very painful.

Let’s fit a saturated model, with and without each non-linear effect and only retain the ones that improve the fit to the data according to LR tests and AIC.

Let’s refit the model using ML and take our first look

mm_beyond_opt <- glmmTMB(tlf ~ poly(jday_c,2) + sex + poly(density, 2) + poly(sex_ratio_y_l,2) + poly(sex_ratio_rg_l,2) + origin + release + sex*origin + sex*poly(sex_ratio_y_l, 2)+ sex*poly(jday_c,2) + (1|group) + (1|year), data = mm_data, family = nbinom2)

summary(mm_beyond_opt)
##  Family: nbinom2  ( log )
## Formula:          
## tlf ~ poly(jday_c, 2) + sex + poly(density, 2) + poly(sex_ratio_y_l,  
##     2) + poly(sex_ratio_rg_l, 2) + origin + release + sex * origin +  
##     sex * poly(sex_ratio_y_l, 2) + sex * poly(jday_c, 2) + (1 |  
##     group) + (1 | year)
## Data: mm_data
## 
##      AIC      BIC   logLik deviance df.resid 
##  10022.3  10180.1  -4988.2   9976.3     7040 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  group  (Intercept) 0.06617  0.2572  
##  year   (Intercept) 0.30399  0.5514  
## Number of obs: 7063, groups:  group, 191; year, 9
## 
## Dispersion parameter for nbinom2 family (): 0.424 
## 
## Conditional model:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                   -1.16026    0.21853  -5.309 1.10e-07 ***
## poly(jday_c, 2)1              -8.24732    5.20569  -1.584 0.113128    
## poly(jday_c, 2)2              -5.79633    4.89045  -1.185 0.235925    
## sexM                          -0.27044    0.07196  -3.758 0.000171 ***
## poly(density, 2)1              2.81447    5.41565   0.520 0.603278    
## poly(density, 2)2             -8.70523    4.89833  -1.777 0.075538 .  
## poly(sex_ratio_y_l, 2)1       28.80544   15.77002   1.827 0.067761 .  
## poly(sex_ratio_y_l, 2)2       -6.83263   16.20791  -0.422 0.673345    
## poly(sex_ratio_rg_l, 2)1       0.54113    4.22667   0.128 0.898127    
## poly(sex_ratio_rg_l, 2)2       1.95145    3.68914   0.529 0.596825    
## originNOR                      0.62994    0.17719   3.555 0.000378 ***
## releaseFrissel Bridge         -0.19263    0.24361  -0.791 0.429106    
## releaseHard Rock              -0.04497    0.14785  -0.304 0.761009    
## releaseHomestead              -0.04568    0.14778  -0.309 0.757223    
## releaseSlide Creek             0.61599    0.24912   2.473 0.013411 *  
## sexM:originNOR                 0.28740    0.14527   1.978 0.047880 *  
## sexM:poly(sex_ratio_y_l, 2)1 -24.50219    5.58427  -4.388 1.15e-05 ***
## sexM:poly(sex_ratio_y_l, 2)2   0.66531    5.46800   0.122 0.903158    
## poly(jday_c, 2)1:sexM         -2.24757    5.39766  -0.416 0.677119    
## poly(jday_c, 2)2:sexM          3.84762    5.36793   0.717 0.473511    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now let’s examine the of modeling effects of julian day, release group density, annual sex ratio and release group sex ratio as non-linear improves the model, in turn.

mm_beyond_opt_lin <- glmmTMB(tlf ~ jday_c + sex + density + sex_ratio_y_l + sex_ratio_rg_l + origin + release + sex*origin + sex*sex_ratio_y_l+ sex*jday_c + (1|group) + (1|year), data = mm_data, family = nbinom2)

mm_beyond_opt_j <- glmmTMB(tlf ~ poly(jday_c,2) + sex + density + sex_ratio_y_l + sex_ratio_rg_l + origin + release + sex*origin + sex*sex_ratio_y_l+ sex*poly(jday_c,2) + (1|group) + (1|year), data = mm_data, family = nbinom2)

mm_beyond_opt_s <- glmmTMB(tlf ~ jday_c + sex + density + poly(sex_ratio_y_l,2) + sex_ratio_rg_l + origin + release + sex*origin + sex*poly(sex_ratio_y_l,2)+ sex*jday_c + (1|group) + (1|year), data = mm_data, family = nbinom2)

mm_beyond_opt_d <- glmmTMB(tlf ~ jday_c + sex + poly(density,2) + sex_ratio_y_l + sex_ratio_rg_l + origin + release + sex*origin + sex*sex_ratio_y_l+ sex*jday_c + (1|group) + (1|year), data = mm_data, family = nbinom2)

mm_beyond_opt_r <- glmmTMB(tlf ~ jday_c + sex + density + sex_ratio_y_l + poly(sex_ratio_rg_l,2) + origin + release + sex*origin + sex*sex_ratio_y_l+ sex*jday_c + (1|group) + (1|year), data = mm_data, family = nbinom2)

AIC(mm_beyond_opt, mm_beyond_opt_lin, mm_beyond_opt_j, mm_beyond_opt_s, mm_beyond_opt_d, mm_beyond_opt_r)
BIC(mm_beyond_opt, mm_beyond_opt_lin, mm_beyond_opt_j, mm_beyond_opt_s, mm_beyond_opt_d, mm_beyond_opt_r)

Conclusion Only fitting density as non-linear improves the fit, and only marginally so. We will keep it and remove any non-linear effect using our standard model selection.

So our starting model for model selection is below

mm_start <- glmmTMB(tlf ~ jday_c + sex + poly(density, 2) + sex_ratio_y_l + sex_ratio_rg_l + origin + release + sex*origin + sex*sex_ratio_y_l+ sex*jday_c + (1|group) + (1|year), data = mm_data, family = nbinom2)
summary(mm_start)
##  Family: nbinom2  ( log )
## Formula:          
## tlf ~ jday_c + sex + poly(density, 2) + sex_ratio_y_l + sex_ratio_rg_l +  
##     origin + release + sex * origin + sex * sex_ratio_y_l + sex *  
##     jday_c + (1 | group) + (1 | year)
## Data: mm_data
## 
##      AIC      BIC   logLik deviance df.resid 
##  10014.1  10137.6  -4989.1   9978.1     7045 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  group  (Intercept) 0.06813  0.2610  
##  year   (Intercept) 0.31548  0.5617  
## Number of obs: 7063, groups:  group, 191; year, 9
## 
## Dispersion parameter for nbinom2 family (): 0.424 
## 
## Conditional model:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -1.3162747  0.2337900  -5.630 1.80e-08 ***
## jday_c                -0.0031665  0.0016659  -1.901 0.057334 .  
## sexM                  -0.1440000  0.0827003  -1.741 0.081644 .  
## poly(density, 2)1      3.0872426  5.4352051   0.568 0.570029    
## poly(density, 2)2     -8.6648008  4.9174879  -1.762 0.078063 .  
## sex_ratio_y_l          0.9132550  0.5183848   1.762 0.078115 .  
## sex_ratio_rg_l         0.0201285  0.0490036   0.411 0.681251    
## originNOR              0.6086404  0.1750268   3.477 0.000506 ***
## releaseFrissel Bridge -0.2064859  0.2432261  -0.849 0.395911    
## releaseHard Rock      -0.0391494  0.1481728  -0.264 0.791615    
## releaseHomestead      -0.0447585  0.1479057  -0.303 0.762183    
## releaseSlide Creek     0.6171644  0.2508596   2.460 0.013886 *  
## sexM:originNOR         0.2788344  0.1416324   1.969 0.048985 *  
## sexM:sex_ratio_y_l    -0.7751709  0.1778264  -4.359 1.31e-05 ***
## jday_c:sexM           -0.0007082  0.0017781  -0.398 0.690421    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Wald Tests

First interaction to remove is sex*jday

mm_wald <- glmmTMB(tlf ~ jday_c + sex + poly(density, 2) + sex_ratio_y_l + sex_ratio_rg_l + origin + release + sex*origin + sex*sex_ratio_y_l + (1|group) + (1|year), data = mm_data, family = nbinom2)

summary(mm_wald)
##  Family: nbinom2  ( log )
## Formula:          
## tlf ~ jday_c + sex + poly(density, 2) + sex_ratio_y_l + sex_ratio_rg_l +  
##     origin + release + sex * origin + sex * sex_ratio_y_l + (1 |  
##     group) + (1 | year)
## Data: mm_data
## 
##      AIC      BIC   logLik deviance df.resid 
##  10012.3  10128.9  -4989.1   9978.3     7046 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  group  (Intercept) 0.06772  0.2602  
##  year   (Intercept) 0.31507  0.5613  
## Number of obs: 7063, groups:  group, 191; year, 9
## 
## Dispersion parameter for nbinom2 family (): 0.424 
## 
## Conditional model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -1.315793   0.233593  -5.633 1.77e-08 ***
## jday_c                -0.003523   0.001404  -2.509 0.012097 *  
## sexM                  -0.143310   0.082670  -1.734 0.083003 .  
## poly(density, 2)1      3.001854   5.424175   0.553 0.579975    
## poly(density, 2)2     -8.753665   4.906285  -1.784 0.074395 .  
## sex_ratio_y_l          0.909403   0.517956   1.756 0.079131 .  
## sex_ratio_rg_l         0.018137   0.048690   0.372 0.709530    
## originNOR              0.607106   0.174874   3.472 0.000517 ***
## releaseFrissel Bridge -0.206853   0.242925  -0.852 0.394486    
## releaseHard Rock      -0.039718   0.147986  -0.268 0.788399    
## releaseHomestead      -0.044925   0.147705  -0.304 0.761011    
## releaseSlide Creek     0.615331   0.250461   2.457 0.014018 *  
## sexM:originNOR         0.280642   0.141532   1.983 0.047381 *  
## sexM:sex_ratio_y_l    -0.770733   0.177417  -4.344 1.40e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Next is sex ratio of release group

mm_wald <- glmmTMB(tlf ~ jday_c + sex + poly(density, 2) + sex_ratio_y_l  + origin + release + sex*origin + sex*sex_ratio_y_l + (1|group) + (1|year), data = mm_data, family = nbinom2)

summary(mm_wald)
##  Family: nbinom2  ( log )
## Formula:          
## tlf ~ jday_c + sex + poly(density, 2) + sex_ratio_y_l + origin +  
##     release + sex * origin + sex * sex_ratio_y_l + (1 | group) +  
##     (1 | year)
## Data: mm_data
## 
##      AIC      BIC   logLik deviance df.resid 
##  10010.4  10120.2  -4989.2   9978.4     7047 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  group  (Intercept) 0.0687   0.2621  
##  year   (Intercept) 0.3124   0.5589  
## Number of obs: 7063, groups:  group, 191; year, 9
## 
## Dispersion parameter for nbinom2 family (): 0.424 
## 
## Conditional model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -1.324668   0.231825  -5.714 1.10e-08 ***
## jday_c                -0.003484   0.001404  -2.482 0.013064 *  
## sexM                  -0.139211   0.081942  -1.699 0.089339 .  
## poly(density, 2)1      2.587068   5.320652   0.486 0.626803    
## poly(density, 2)2     -8.885071   4.907494  -1.811 0.070217 .  
## sex_ratio_y_l          0.922321   0.514927   1.791 0.073266 .  
## originNOR              0.614082   0.174181   3.526 0.000423 ***
## releaseFrissel Bridge -0.198600   0.242495  -0.819 0.412795    
## releaseHard Rock      -0.035294   0.147941  -0.239 0.811441    
## releaseHomestead      -0.040906   0.147787  -0.277 0.781941    
## releaseSlide Creek     0.617516   0.251277   2.458 0.013990 *  
## sexM:originNOR         0.281611   0.141532   1.990 0.046621 *  
## sexM:sex_ratio_y_l    -0.768517   0.177331  -4.334 1.47e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Next to remove is non-linear effect of density.

mm_wald <- glmmTMB(tlf ~ jday_c + sex + density + sex_ratio_y_l  + origin + release + sex*origin + sex*sex_ratio_y_l + (1|group) + (1|year), data = mm_data, family = nbinom2)

summary(mm_wald)
##  Family: nbinom2  ( log )
## Formula:          
## tlf ~ jday_c + sex + density + sex_ratio_y_l + origin + release +  
##     sex * origin + sex * sex_ratio_y_l + (1 | group) + (1 | year)
## Data: mm_data
## 
##      AIC      BIC   logLik deviance df.resid 
##  10011.7  10114.7  -4990.9   9981.7     7048 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  group  (Intercept) 0.06864  0.2620  
##  year   (Intercept) 0.31207  0.5586  
## Number of obs: 7063, groups:  group, 191; year, 9
## 
## Dispersion parameter for nbinom2 family (): 0.423 
## 
## Conditional model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -1.411096   0.261099  -5.404 6.50e-08 ***
## jday_c                -0.003239   0.001397  -2.319  0.02040 *  
## sexM                  -0.139115   0.081920  -1.698  0.08947 .  
## density                0.001400   0.001471   0.952  0.34118    
## sex_ratio_y_l          1.062462   0.508861   2.088  0.03680 *  
## originNOR              0.521367   0.165979   3.141  0.00168 ** 
## releaseFrissel Bridge -0.232918   0.242035  -0.962  0.33588    
## releaseHard Rock      -0.040117   0.147979  -0.271  0.78632    
## releaseHomestead      -0.098709   0.144843  -0.681  0.49556    
## releaseSlide Creek     0.620544   0.251266   2.470  0.01352 *  
## sexM:originNOR         0.279735   0.141531   1.976  0.04810 *  
## sexM:sex_ratio_y_l    -0.765650   0.177247  -4.320 1.56e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now density goes.

mm_wald <- glmmTMB(tlf ~ jday_c + sex  + sex_ratio_y_l  + origin + release + sex*origin + sex*sex_ratio_y_l + (1|group) + (1|year), data = mm_data, family = nbinom2)

summary(mm_wald)
##  Family: nbinom2  ( log )
## Formula:          
## tlf ~ jday_c + sex + sex_ratio_y_l + origin + release + sex *  
##     origin + sex * sex_ratio_y_l + (1 | group) + (1 | year)
## Data: mm_data
## 
##      AIC      BIC   logLik deviance df.resid 
##  10010.7  10106.7  -4991.3   9982.7     7049 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  group  (Intercept) 0.06592  0.2567  
##  year   (Intercept) 0.30325  0.5507  
## Number of obs: 7063, groups:  group, 191; year, 9
## 
## Dispersion parameter for nbinom2 family (): 0.422 
## 
## Conditional model:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           -1.295645   0.228342  -5.674 1.39e-08 ***
## jday_c                -0.003431   0.001371  -2.502  0.01235 *  
## sexM                  -0.146965   0.081573  -1.802  0.07160 .  
## sex_ratio_y_l          1.018243   0.499938   2.037  0.04168 *  
## originNOR              0.454544   0.149419   3.042  0.00235 ** 
## releaseFrissel Bridge -0.210725   0.238724  -0.883  0.37739    
## releaseHard Rock      -0.050715   0.146428  -0.346  0.72908    
## releaseHomestead      -0.074082   0.141101  -0.525  0.59956    
## releaseSlide Creek     0.626176   0.248675   2.518  0.01180 *  
## sexM:originNOR         0.287140   0.141252   2.033  0.04207 *  
## sexM:sex_ratio_y_l    -0.749253   0.176463  -4.246 2.18e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Final model according to Wald Tests includes Julian Day, Sex, Annual Sex Ratio, Release Location, Origin, Sex * Origin and Sex * Annual Sex Ratio

LRT

mm_lrt <- glmmTMB(tlf ~ jday_c + sex + poly(density, 2) + sex_ratio_y_l + sex_ratio_rg_l + origin + release + sex*origin + sex*sex_ratio_y_l  + sex*jday_c+ (1|group) + (1|year), data = mm_data, family = nbinom2)
drop1(mm_lrt, test = "Chisq")

First to drop is the jday*sex interaction.

mm_lrt <- glmmTMB(tlf ~ jday_c + sex + poly(density, 2) + sex_ratio_y_l + sex_ratio_rg_l + origin + release + sex*origin + sex*sex_ratio_y_l + (1|group) + (1|year), data = mm_data, family = nbinom2)
drop1(mm_lrt, test = "Chisq")

Then the release group sex ratio

mm_lrt <- glmmTMB(tlf ~ jday_c + sex + poly(density, 2) + sex_ratio_y_l  + origin + release + sex*origin + sex*sex_ratio_y_l + (1|group) + (1|year), data = mm_data, family = nbinom2)
drop1(mm_lrt, test = "Chisq")

Then non-linear density effect.

mm_lrt <- glmmTMB(tlf ~ jday_c + sex + density + sex_ratio_y_l  + origin + release + sex*origin + sex*sex_ratio_y_l + (1|group) + (1|year), data = mm_data, family = nbinom2)
drop1(mm_lrt, test = "Chisq")

Now density

mm_lrt <- glmmTMB(tlf ~ jday_c + sex +  sex_ratio_y_l  + origin + release + sex*origin + sex*sex_ratio_y_l + (1|group) + (1|year), data = mm_data, family = nbinom2)
drop1(mm_lrt, test = "Chisq")

Uh-oh something different. Now we remove release location effect.

mm_lrt <- glmmTMB(tlf ~ jday_c + sex +  sex_ratio_y_l  + origin + sex*origin + sex*sex_ratio_y_l + (1|group) + (1|year), data = mm_data, family = nbinom2)
drop1(mm_lrt, test = "Chisq")

Done. Final model includes sex, annual sex ratio, release day, origin and two interactions sex* origin and sex * annual sex ratio

Final Model

Stepwise selection according to Wald Test and likelihood ratio tests do not produce the same final model. Using likelihood ratio tests produces a sparse model that does not include the effect of release location compared to the Wald Test approach.

I think we should be more conservative here and use the final model from the LRT model selection procedure for a few reasons:
(1) For this question (which predictors of fitness are significant), I’m inclined towards being conservative, generally.
(2) There is moderate collinearity between origin and release location. Not enough to warrant excluding one or the other from the model selection procedure, but enough to elicit a small amount of concern.
(3) Following up on 2 above, while there is not strong collinearity between origin and release location, this is because a small number HORs from Cougar got placed at otherwise only hatchery outplant HOR relerase locations (such as homestead). So, if we restrict ourselves to the NOR cougar and hatchery outplant HOR comparison, we likely would have set off some multicollinearity flags

Let’s define our final model and collect the results

final_model <- mm_lrt
kable(broom.mixed::tidy(final_model) %>% select(effect, group, term, estimate, std.error, p.value), caption =  "Final Model Fit", digits = 3) %>% kable_classic(font = "Arial")
Final Model Fit
effect group term estimate std.error p.value
fixed (Intercept) -1.343 0.208 0.000
fixed jday_c -0.004 0.001 0.003
fixed sexM -0.150 0.082 0.066
fixed sex_ratio_y_l 1.042 0.501 0.038
fixed originNOR 0.446 0.132 0.001
fixed sexM:originNOR 0.293 0.142 0.039
fixed sexM:sex_ratio_y_l -0.750 0.177 0.000
ran_pars group sd__(Intercept) 0.291
ran_pars year sd__(Intercept) 0.553

Model validation

Let’s check on the fit.

simulateResiduals(final_model, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.6714492 0.6519188 0.1322061 0.8080857 0.4700555 0.3376268 0.1594444 0.9509483 0.4706737 0.8536594 0.1037052 0.004410544 0.4092714 0.483045 0.3475935 0.9851803 0.4080342 0.4874469 0.1771595 0.1913716 ...

Model fits excellently! The qq plot of simulated residuals shows a good fit, there is no evidence of overdispersion, and outliers don’t seem to drive the fit. We also don’t see any pattern to the residuals.

Effects

Here we interpret the model fit.

There are three significant main effects, and two significant interactions. There are also two random effects.

Main Effects There are 4 main effects in the final model, but one (sex) is not signficant and retained only because of it’s interaction with origin and annual sex ratio.

Interpreting the meaning of the fitted model parameters can be very challenging because of the link-function and the fact that multiple predictors are significant and interact.

We use the effects package to translate the estimated parameters from the model into something we can actually understand. The approach here predicts the effect of different levels of a focal predictor, holding all others constant at a fixed values.

These fixed values are determined differently for factor and numerical predictors. For numerical predictors (annual sex ratio and Julian Day of release), the fixed value is the simple mean. For factor predictors (sex and origin), we use a weighted average of the fitted values the levels of the factor, with weights proportional to the number of cases in each level.

require(effects)
plot(predictorEffect("origin",final_model),
lines=list(multiline=TRUE), confint=list(style="bars"))

plot(predictorEffect("sex_ratio_y_l",final_model), lines=list(multiline=TRUE), confint=list(style="bars"))

plot(predictorEffect("jday_c",final_model))

Origin: NOR salmon have substantially higher predicted fitness than HOR (Wald Test p = 0.000705) and this effect is somewhat stronger for males than females, but this interaction is only marginally significant and has a limited effect size (Wald Test = 0.0387). NOR males are predicted to be 2.1 fold more fit the HOR males, and NOR females are predicted to be 1.6 fold more fit than HOR females.
Sex Ratio: Overall, the annual sex ratio has a small effect on fitness, with male biased sex ratios producing somewhat higher fitness than females (Wald Test p = ), however this effect was much stronger for females than males (Wald Test p = ). Put in simple terms, changes in annual sex ratio affects fitness mostly through females, who perform worse when the sex ratio is female biased. Using the most extreme values in observed in any years (male:female ratio 0.6 and 2.0), female fitness is predicted to vary 3.5 fold, whereas male fitness is expected to vary 1.4 fold.
Release Day: Earlier releases are predicted to have greater fitness than later releases (Wald Test p value ). Individuals released on the earliest day are predicted to have 1.7 fold greater fitness than the latest release day.

Random Effects

The among year standard deviation in TLF was 0.5529 (log scale), and the among release group standard deviation was 0.2915 (log scale)

Publication Plots

Since the interaction is significant, let’s plot and examine the effects of sex and sex ratio together.

#let's not use the effect package plotting tools and just get the effects

#eff1 <- predictorEffect("sex_ratio_y_l", final_model, focal.levels = seq(-0.4,0.7,by = 0.1))
eff1 <- predictorEffect("sex_ratio_y_l", final_model)
## Warning in Effect.glmmTMB(ans, mod, x.var = 1, xlevels = xlevels, ...):
## overriding variance function for effects: computed variances may be incorrect
## Warning in Analyze.model(focal.predictors, mod, xlevels, default.levels, : the
## predictor jday_c is a one-column matrix that was converted to a vector
effdf <- as.data.frame(eff1)

effdf %<>%
  mutate(sex_ratio = exp(sex_ratio_y_l))

effdf %<>%
  mutate(fit_no_inter = exp((sex_ratio_y_l*1.041603)-(1.342716)))

actual_means <- mm_data %>%
  group_by(year, sex, sex_ratio_y) %>%
  summarise(mean_tlf = mean(tlf))
## `summarise()` has grouped output by 'year', 'sex'. You can override using the
## `.groups` argument.
ggplot(data = effdf, aes(x = (sex_ratio), y = fit, color = sex))+ 
  geom_line()+scale_x_continuous(trans = "log", n.breaks = 10) +
  xlab( bquote(atop("Sex Ratio",(N[male]/N[female])))) +
  geom_smooth( aes(ymin = lower, ymax = upper, fill = sex, colour = sex), stat = "identity") +
  theme_bw()+ylab("TLF")+coord_cartesian(ylim = c(0, 1.25)) + scale_color_manual(labels = c("Female", "Male"), name = "Sex", values = c("#228833", "#AA3377")) + scale_fill_manual(labels = c("Female", "Male"), name = "Sex", values = c("#228833", "#AA3377")) +
  geom_rug(data = mm_data, aes(x = sex_ratio_y, y = NULL, color = NULL)) 

The plot above shows the predicted TLF (lines and 95% confidence intervals) from the final mixed model fit, and the mean TLF from the empirical data (triangles) against sex ratio in a given year for both male and female Chinook salmon outplanted or reintroduced above Cougar dam from 2007 to 2015.

#let's not use the effect package plotting tools and just get the effects

#eff1 <- predictorEffect("sex_ratio_y_l", final_model, focal.levels = seq(-0.4,0.7,by = 0.1))
eff1 <- predictorEffect("origin", final_model)
## Warning in Effect.glmmTMB(ans, mod, x.var = 1, xlevels = xlevels, ...):
## overriding variance function for effects: computed variances may be incorrect
## Warning in Analyze.model(focal.predictors, mod, xlevels, default.levels, : the
## predictor jday_c is a one-column matrix that was converted to a vector
effdf <- as.data.frame(eff1)


ggplot(data = effdf, aes(x = (origin), y = fit, color = sex))+ 
  geom_point(position=position_dodge(width=0.3)) + 
  geom_errorbar(aes(ymin = lower, ymax = upper), position=position_dodge(width=0.3), width = 0.1)+scale_color_manual(labels = c("Female", "Male"), name = "Sex", values = c("#228833", "#AA3377"))+ylab("TLF")+xlab("Origin")+theme_bw()

#let's not use the effect package plotting tools and just get the effects

#eff1 <- predictorEffect("sex_ratio_y_l", final_model, focal.levels = seq(-0.4,0.7,by = 0.1))
eff1 <- predictorEffect("jday_c", final_model)
## Warning in Effect.glmmTMB(ans, mod, x.var = 1, xlevels = xlevels, ...):
## overriding variance function for effects: computed variances may be incorrect
## Warning in Analyze.model(focal.predictors, mod, xlevels, default.levels, : the
## predictor jday_c is a one-column matrix that was converted to a vector
effdf <- as.data.frame(eff1)

effdf %<>%
  mutate(jday = jday_c + 220)

actual_means <- mm_data %>%
  group_by(year, sex, sex_ratio_y) %>%
  summarise(mean_tlf = mean(tlf))
## `summarise()` has grouped output by 'year', 'sex'. You can override using the
## `.groups` argument.
ggplot(data = effdf, aes(x = (jday), y = fit))+ 
  geom_line()+
  geom_smooth( aes(ymin = lower, ymax = upper), stat = "identity") +
  theme_bw()+
  xlab("Julian Day of Release")+ylab("TLF") 

Other Analyses

This section of the notebook is dedicated to the numerous small questions and half-formed ideas that pop up during the work. It is much less well documented than the rest and is not intended to be shared widely.

Size effects

We have an estimate of size at maturity for about half (4321/8985) of our candidate parents. Our modeling tools are not compatible with missing data, so this would effectively cut our sample size in half. We chose not to fit it, but are examining it in a post hoc analysis in more detail here.

Size at maturity is positively correlated with fitness in many salmonid species. Fitness differences between HOR and NOR salmon are partially mediated through lower size of HOR vs NOR adults returning to spawn, which is in turn driven by both growth rates and differences in age at maturity for NOR and HOR salmon.

In this analysis we ask several questions:

  1. Is size a significant predictor of fitness?
  2. Do HOR and NOR salmon released above the dam vary in size?
  3. Are these size differences driven by differences in AAM between NOR and HOR fish?
  4. Can we parse the effects of size and origin? What is the best way to articulate the causal relationships between these variables and fitness? In other words do our data allow us to state things like “fitness differences between NOR and HOR candidate parents are driven by differences in size at maturity between HOR and NOR salmon”?

Analysis

mm_data_size <-  parents %>%
  filter( year <2016) %>%
  select(date, sex, release, rkm, origin, year, type, length, tlf) %>%
  drop_na() %>%
  mutate(jday = as.numeric(format(date, "%j"))) %>% #julian day in this case: days since the first day of the year
  mutate(jday_c = scale(jday, scale = F), #center the julian day to help with convergence
         sex = as.factor(sex),
         release= as.factor(release),
         year = as.factor(year),
         group = as.factor(paste(date, release, type)))

# The dataset include 7438 individuals

# lets add density
dens <- mm_data_size %>%
 group_by(jday, release, year, type) %>%
  summarise(density = n())

mm_data_size %<>%
  left_join(dens)

# lets add overall size of release in a year
dens <- mm_data_size %>%
 group_by(year) %>%
  summarise(annual_n = n())

mm_data_size %<>%
  left_join(dens)


# lets add sex ratio
#build release group sex ratio variable
f <- mm_data_size %>% 
  filter(sex == "F") %>%
  group_by(release, jday, year, type) %>% 
  summarise(n_female_rg = n()) 

mm_data_size %<>%
  left_join(f)

m <- mm_data_size %>% 
  filter(sex == "M") %>%
  group_by(release, jday, year, type) %>% 
  summarise(n_male_rg = n()) 

mm_data_size %<>%
  left_join(m) %>%
  mutate(sex_ratio_rg = n_male_rg/n_female_rg)

# maybe the release groups all mix thoroughly spawn together and we should fit sex ratio at the level of year
f <- mm_data_size %>% 
  filter(sex == "F") %>%
  group_by( year) %>% 
  summarise(n_female_y = n()) 

mm_data_size %<>%
  left_join(f)

m <- mm_data_size %>% 
  filter(sex == "M") %>%
  group_by( year) %>% 
  summarise(n_male_y = n()) 

mm_data_size %<>%
  left_join(m) %>%
  mutate(sex_ratio_y = n_male_y/n_female_y)


mm_data_size %<>%
  mutate(sex_ratio_rg_l = log(sex_ratio_rg),
         sex_ratio_y_l = log(sex_ratio_y))

# there are a bunch of days where the only males or only females are released, this means we wind up with NAs when trying to log transform release group level variable, reducing sample size by 375 individuals. If we don't use that variable (release group sex ratio), then we can get rid of this dataset and use one with the full set of individuals, let's remember this

mm_data_size %<>%
  drop_na()

# oops forgot to convert origin to a factor
mm_data_size %<>%
  mutate(origin = as.factor(origin))

Size effect

To assess the effect of size, we will compare the final model to a model with an additional effet of size and conduct LRT and Wald Tests

size_glmm <- glmmTMB(tlf ~ jday_c + sex +  sex_ratio_y_l  + origin + sex*origin + sex*sex_ratio_y_l + length+ (1|group) + (1|year), data = mm_data_size, family = nbinom2)
null_size_glmm <-  glmmTMB(tlf ~ jday_c + sex +  sex_ratio_y_l  + origin + sex*origin + sex*sex_ratio_y_l +  (1|group) + (1|year), data = mm_data_size, family = nbinom2)

size_glmm2 <- glmmTMB(tlf ~ jday_c + sex +  sex_ratio_y_l  + origin + sex*origin + sex*sex_ratio_y_l + length*origin+ (1|group) + (1|year), data = mm_data_size, family = nbinom2)

AIC(size_glmm, null_size_glmm, size_glmm2)
BIC(size_glmm, null_size_glmm, size_glmm2)
anova(size_glmm, null_size_glmm, size_glmm2)
summary(size_glmm)
##  Family: nbinom2  ( log )
## Formula:          tlf ~ jday_c + sex + sex_ratio_y_l + origin + sex * origin +  
##     sex * sex_ratio_y_l + length + (1 | group) + (1 | year)
## Data: mm_data_size
## 
##      AIC      BIC   logLik deviance df.resid 
##   5190.9   5259.5  -2584.4   5168.9     3770 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  group  (Intercept) 0.05222  0.2285  
##  year   (Intercept) 0.12061  0.3473  
## Number of obs: 3781, groups:  group, 144; year, 6
## 
## Dispersion parameter for nbinom2 family (): 0.574 
## 
## Conditional model:
##                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -6.699216   0.488857 -13.704   <2e-16 ***
## jday_c             -0.002768   0.001597  -1.733   0.0831 .  
## sexM               -0.199615   0.110631  -1.804   0.0712 .  
## sex_ratio_y_l       0.318231   0.331833   0.959   0.3376    
## originNOR           0.211528   0.134681   1.571   0.1163    
## length              0.068985   0.005942  11.610   <2e-16 ***
## sexM:originNOR      0.397950   0.166145   2.395   0.0166 *  
## sexM:sex_ratio_y_l -0.563243   0.233486  -2.412   0.0159 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Yes, size is signficiantly predictive of TLF, but we already see the relationship with origin here. The origin effect is much reduced when also including size, suggesting correlation between these two predictors. There is no evidence of size*origin interaction.

Let’s look more closely at the relationship between sex, size at maturity, and origin.

We’ll make some summary plots and then fit a mixed model evaluating differences in length between sex and origin, using year as a random effect

ggplot(data = mm_data_size) + geom_boxplot(aes(origin, length, color = sex))+theme_bw()

ggplot(data = mm_data_size) + geom_boxplot(aes(year, length, color = origin))+theme_bw()

mm_data_size %>%
  group_by(origin, sex) %>% 
  summarise(mean = mean(length), sd = sd(length))
## `summarise()` has grouped output by 'origin'. You can override using the
## `.groups` argument.
mm_data_size %>%
  group_by(origin) %>% 
  summarise(mean = mean(length), sd = sd(length))
summary(lmer(length ~ origin + sex+ (1|year), data = mm_data_size))
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: length ~ origin + sex + (1 | year)
##    Data: mm_data_size
## 
## REML criterion at convergence: 25456.3
## 
## Scaled residuals: 
##      Min       1Q   Median       3Q      Max 
## -10.8389  -0.5697   0.0259   0.6053   4.8413 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev.
##  year     (Intercept)  3.524   1.877   
##  Residual             48.887   6.992   
## Number of obs: 3781, groups:  year, 6
## 
## Fixed effects:
##              Estimate Std. Error        df t value Pr(>|t|)    
## (Intercept)   75.8879     0.7910    5.4304  95.935 5.58e-10 ***
## originNOR      2.9210     0.2595 3722.6194  11.255  < 2e-16 ***
## sexM          -1.7845     0.2340 3776.6611  -7.625 3.07e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##           (Intr) orgNOR
## originNOR -0.112       
## sexM      -0.131 -0.148
drop1(lmer(length ~ origin + sex+ (1|year), data = mm_data_size), test = "Chisq")

NOR candidate parents are signficantly larger (beta = 2.92, se = 0.26, pvalue LRT < 2e-16), after controlling for the effect of sex and fitting year as a random effect.

So we’ve established larger candidate parents produce more offspring, NOR candidate parents are significantly larger, and NOR candidate parents have higher fitness. Can we attempt to parse these effects?

How much of the variation in fitness is due to origin alone and how much is due to size alone. Let’s estimate partial coefficients of determination for origin and size. This is rarely reported in glmms, but to my understanding there is no reason why it is inappropriate (Stoffel et al 2021)

# let's rescale some of the variable (fitting with lme4 which has a much harder time converging than glmmtmb)
#mm_data_size_scaled <- mm_data_size %>%
#  select(tlf, jday_c, sex, sex_ratio_y_l, origin,length, group, year) %>%
#  mutate(jday2 = jday_c/44,
#         length2 = length/10)



glmfit <- glm.nb(tlf ~ jday_c + sex +  sex_ratio_y_l  + origin + sex*origin + sex*sex_ratio_y_l + length+ group + year, data = mm_data_size_scaled)
glmfit <- glm.nb(tlf ~ jday_c + sex +  sex_ratio_y_l  + origin + sex*origin + sex*sex_ratio_y_l + length+ group + year, data = mm_data_size_scaled)

r2 <- rsq::rsq.partial(glmfit, adj = TRUE)
r2b <- rsq::rsq.glmm(lm4fit)

Okay, holding off on this for now because there is no straightforard implementation in R for glmmTMB, or any negbin glmm for that matter. It seems like it can be done according to Nakagawa 2017 (https://royalsocietypublishing.org/doi/10.1098/rsif.2017.0213). We “just” do commonality analysis on our model fit (https://besjournals.onlinelibrary.wiley.com/doi/pdf/10.1111/2041-210X.12166). The challenge is taking the time to parse the model object in R, and fitting conditioned models to calculate the partial R2s.

Are these size differences driven by differences in AAM between NOR and HOR fish?
This was specifically proposed in the 2014 report. ### Conclusions

Larger candidate parents produce more offspring. NOR candidate parents are significantly larger. NOR candidate parents have higher fitness. Ultimately, we need to do some variance partitioning through commonality analysis to parse how much of the origin effect is mediated by the difference in size between NOR and HOR candidate parents.

Mate Pairs

What is the relative fitness of different mate pair types: HORxHOR, HORxNOR, NORxHOR and NORxNOR (female x male)?

Let’s only consider the years where both NOR and HOR salmon are released above the dam (i.e. 2010 and later), and only the years where at least age 5 offspring have returned ( 2015 and earlier).

We will also only examine the subset of the pedigree where both parents are known.

Let’s start with the way everyone else seems to ask this: is the fitness of NxH or HxN crosses lower than NxN.

# Let's filter the 
mate_pair <- pedigree_meta %>%
  filter(assn_type == "pair", parent_year < 2016, parent_year > 2009) %>%
  left_join(select(parents, sample_id, origin, tlf), by = c("father" = "sample_id")) %>%
  rename(father_origin = origin, father_tlf = tlf) %>%
  left_join(select(parents, sample_id, origin, tlf), by = c("mother" = "sample_id")) %>%
  rename(mother_origin = origin, mother_tlf = tlf) %>%
  mutate(cross = case_when(mother_origin == "NOR" & father_origin == "HOR" ~ "NxH",
                           mother_origin == "HOR" & father_origin == "NOR" ~ "HxN",
                           mother_origin == "NOR" & father_origin == "NOR" ~ "NxN",
                           mother_origin == "HOR" & father_origin == "HOR" ~ "HxH"))

mate_pair %>%
  group_by(cross) %>%
  summarise(mean_father_tlf = mean(father_tlf), mean_mother_tlf = mean(mother_tlf))

Yes. The fitness of fathers is much lower in a NxH cross than an NxN cross. But the fitness of females is NOT lower in a HxN cross than a NxN cross!!! But the problem here is that we’re not controlling for individual. Mean tlf in the table above includes offspring from other mate pairs.

for each mate_pair we need to calculate its own tlf!

mp_counts <- mate_pair %>%
  unite("pair", mother, father, sep = "__", remove = FALSE) %>%
  count(pair) %>%
  rename(pair_tlf = n)

mate_pair %<>%
  unite("pair", mother, father, sep = "__", remove = FALSE) %>%
  left_join(mp_counts)
## Joining, by = "pair"
mate_pair %>%
  group_by(parent_year, cross) %>%
  summarise(n = n(), mean_tlf_pair = mean(pair_tlf), sd_tlf_pair = sd(pair_tlf)) %>%
  pivot_wider(id_cols = parent_year, names_from = cross, values_from = mean_tlf_pair )
## `summarise()` has grouped output by 'parent_year'. You can override using the
## `.groups` argument.

Is this significant?

mate_pair %<>%
  mutate(year_f = as.factor(parent_year),
         cross_f = as.factor(cross)) %>%
  mutate(cross_f = fct_relevel(cross_f, "NxN",   "HxN", "NxH","HxH"))
  
mm_pair <- glmmTMB(pair_tlf ~ cross_f + (1|year_f), data = mate_pair, family = nbinom2)
mm_pair_null <- glmmTMB(pair_tlf ~  (1|year_f), data = mate_pair, family = nbinom2)
summary(mm_pair)
##  Family: nbinom2  ( log )
## Formula:          pair_tlf ~ cross_f + (1 | year_f)
## Data: mate_pair
## 
##      AIC      BIC   logLik deviance df.resid 
##   1807.6   1833.8   -897.8   1795.6      574 
## 
## Random effects:
## 
## Conditional model:
##  Groups Name        Variance Std.Dev.
##  year_f (Intercept) 0.0387   0.1967  
## Number of obs: 580, groups:  year_f, 6
## 
## Dispersion parameter for nbinom2 family ():  326 
## 
## Conditional model:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.54156    0.10313   5.251 1.51e-07 ***
## cross_fHxN   0.07012    0.07470   0.939   0.3479    
## cross_fNxH  -0.06766    0.11119  -0.609   0.5428    
## cross_fHxH  -0.20915    0.10107  -2.069   0.0385 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans( mm_pair, "cross_f", type = "response")
##  cross_f response    SE  df lower.CL upper.CL
##  NxN         1.72 0.177 574     1.40     2.10
##  HxN         1.84 0.176 574     1.53     2.22
##  NxH         1.61 0.201 574     1.26     2.05
##  HxH         1.39 0.156 574     1.12     1.74
## 
## Confidence level used: 0.95 
## Intervals are back-transformed from the log scale
plot(emmeans( mm_pair, "cross_f", type = "response"))

#TukeyHSD(aov(data = filter(parents, year == 2015, type %in% c("outplant", "reintro_above", "reintro")), tlf ~ type))

ggplot(data = as.data.frame(emmeans( mm_pair, "cross_f", type = "response")), aes(x = cross_f, y = response, ymin = response-SE, ymax = response+SE)) + geom_pointrange()+theme_classic()+scale_x_discrete(labels = c("NxN\nn=157","HxN\nn=238", "NxH\nn=68", "HxH\nn=117"), name = "Cross Type\n(female x male)")+ylab("Estimated Pair Fitness")

#let's put this in terms of RRS
mm_pair_emmeans <- as.data.frame(emmeans( mm_pair, "cross_f", type = "response"))


mm_pair_emmeans %<>%
  mutate(rrs = response/1.718690,
         rrs_se = SE/1.718690)

ggplot(data =mm_pair_emmeans, aes(x = cross_f, y = rrs, ymin = rrs-rrs_se, ymax = rrs+rrs_se)) + geom_pointrange()+theme_classic()+scale_x_discrete(labels = c("NxN\nn=157","HxN\nn=238", "NxH\nn=68", "HxH\nn=117"), name = "Cross Type\n(female x male)")+ylab("Relative Reproductive Success")

Can we get more power if we reduce overdispersion by including additional vairable we know are important, such as release date?

mate_pair %<>%
  mutate(mother_jday = as.numeric(format(mother_date, "%j")),
         mother_jday_c = scale(mother_jday, scale = F),
         father_jday = as.numeric(format(father_date, "%j")),
        father_jday_c = scale(father_jday, scale = F) )

mm_pair_date <- glmmTMB(pair_tlf ~ cross_f + father_jday_c+ mother_jday_c+ (1|year_f), data = mate_pair, family = nbinom2)
## Warning in fitTMB(TMBStruc): Model convergence problem; non-positive-definite
## Hessian matrix. See vignette('troubleshooting')
#nope won't converge


# what about getting rid of random effect to deal with convergence problems.
mm_pair <- glm.nb(pair_tlf ~ cross_f + year_f, data = mate_pair)
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached

## Warning in theta.ml(Y, mu, sum(w), w, limit = control$maxit, trace =
## control$trace > : iteration limit reached
## Warning in glm.nb(pair_tlf ~ cross_f + year_f, data = mate_pair): alternation
## limit reached
# this also doesnt converge

Couldn’t converge.

Chi Squared test of Mate Pair Type Frequencies

So the

The null expectation is that the frequency of the different mate pair types should be predicted from the frequencies of NOR females, NOR males, HOR females and HOR males.

#mate_pair %>%
#  group_by(cross) %>%
#  summarise(mean_father_tlf = mean(father_tlf), mean_mother_tlf = mean(mother_tlf))

# counts table
ncand <- parents %>%
  count(year, origin, sex) %>%
  pivot_wider(id_cols = year, values_from = n, names_from = c(origin, sex)) %>%
  select(-starts_with("NA"))
mate_pair %>%
  count( parent_year, cross) %>% pivot_wider(id_cols = parent_year, values_from = n, names_from = cross) %>%
  left_join(ncand, by = c("parent_year" = "year"))
# frequency table
fcand <- parents %>%
  filter(year < 2016, year > 2009) %>%
  group_by(year, origin, sex) %>%
  summarise(n = n()) %>%
  group_by(year) %>%
  mutate(freq = n / sum(n)) %>%
  select(-n) %>%
  pivot_wider(id_cols = year, values_from = freq, names_from = c(origin, sex)) %>%
  select(-starts_with("NA"))
## `summarise()` has grouped output by 'year', 'origin'. You can override using the
## `.groups` argument.
mate_pair_freqs <- mate_pair %>%
  count( parent_year, cross) %>% pivot_wider(id_cols = parent_year, values_from = n, names_from = cross) %>%
  mutate(f_HxH = HxH/(HxH+HxN+NxH+NxN),
        f_HxN = HxN/(HxH+HxN+NxH+NxN),
        f_NxH = NxH/(HxH+HxN+NxH+NxN),
        f_NxN = NxN/(HxH+HxN+NxH+NxN)) %>%
  left_join(fcand, by = c("parent_year" = "year")) %>%
  mutate(e_HxH = HOR_F/(HOR_F+NOR_F) * HOR_M/(HOR_M+NOR_M),
         e_HxN = HOR_F/(HOR_F+NOR_F) * NOR_M/(HOR_M+NOR_M),
         e_NxH = NOR_F/(HOR_F+NOR_F) * HOR_M/(HOR_M+NOR_M),
         e_NxN = NOR_F/(HOR_F+NOR_F) * NOR_M/(HOR_M+NOR_M)) %>%
  select(parent_year, starts_with("f"), starts_with("e")) #clean up 


mate_pair_freqs_long <- mate_pair_freqs %>%
  pivot_longer(!parent_year, values_to = "freqs", names_to = c("obs_exp", "cross"), names_pattern = "([fe])_(.*)" ) %>%
  mutate(obs_exp = case_when(obs_exp == "f" ~ "observed",
                             obs_exp == "e" ~ "expected"))# %>%
#  mutate(cross = factor(cross, labels = c("HOR_female x HOR_male", "HOR_female x NOR_male", "NOR_female x HOR_male", "NOR_female x NOR_male")))

ggplot(data = mate_pair_freqs_long, aes(x = parent_year, y = freqs, color = obs_exp))+geom_point()+ facet_grid(~cross)+theme(axis.text.x = element_text(angle = 90))+ylab("Frequency")+xlab("Parent Year")

ggplot(data = mate_pair_freqs_long, aes(x = cross, y = freqs, color = obs_exp))+geom_point()+ theme(axis.text.x = element_text(angle = 90))+facet_grid(cols = vars(parent_year))

#chisq.test(x = c(24, 29 , 3 ,11), p = c(0.518, 0.265,  0.143, 0.0734)) #2010, p = 0.0002
#chisq.test(x = c(19, 89 ,19, 56), p = c(0.262,  0.288,0.214,  0.236)) #2011,  p = 3.0e-12
#chisq.test(x = c(7,  52 ,  20 , 41), p = c(0.219, 0.364, 0.157,  0.260)) #2012 p = 0.0002

In all years there are fewer HORxHOR, and more HORxNOR and NORxNOR mate pairs than expected given the frequencies of HOR and NOR males and females. The observed number of NORxHOR is not consistently different than random expectation.

As for hypothesis testing, the overall Chi-squared goodness of fit test isn’t telling us much. If we know NORs are more fit than HORs then the divergence between observed and expected for HORxHOR and/or NORxNOR could be driving the overall significant values for any given year. For any particular cross type and year, we would need something like a Fisher’s exact test. We should also adjust the p-values given that there are many tests (up to 24).

Need to think about this more, perhaps it is best to figure out how best to run simulations and get p-values that way. For now doing a Chi-Square Test

# get the raw counts back for hyp testing
mate_pair_freqs <- mate_pair %>%
  count( parent_year, cross) %>% pivot_wider(id_cols = parent_year, values_from = n, names_from = cross) %>%
  mutate(f_HxH = HxH/(HxH+HxN+NxH+NxN),
        f_HxN = HxN/(HxH+HxN+NxH+NxN),
        f_NxH = NxH/(HxH+HxN+NxH+NxN),
        f_NxN = NxN/(HxH+HxN+NxH+NxN)) %>%
  left_join(fcand, by = c("parent_year" = "year")) %>%
  mutate(e_HxH = HOR_F/(HOR_F+NOR_F) * HOR_M/(HOR_M+NOR_M),
         e_HxN = HOR_F/(HOR_F+NOR_F) * NOR_M/(HOR_M+NOR_M),
         e_NxH = NOR_F/(HOR_F+NOR_F) * HOR_M/(HOR_M+NOR_M),
         e_NxN = NOR_F/(HOR_F+NOR_F) * NOR_M/(HOR_M+NOR_M))

# then add a column for HxN and NxH specific Chi-Squared Test
mate_pair_freqs %>%
  rowwise() %>%
  mutate(HxN_test = chisq.test(x = c(HxN, HxH+NxH+NxN), p = c(e_HxN, (e_HxH+e_NxH+e_NxN)))$p.value<0.00208,
         NxH_test = chisq.test(x = c(NxH, HxH+HxN+NxN), p = c(e_NxH, (e_HxH+e_HxN+e_NxN)))$p.value<0.00208,
         NxN_test = chisq.test(x = c(NxN, HxH+HxN+NxH), p = c(e_NxN, (e_HxH+e_HxN+e_NxH)))$p.value<0.00208,
         HxH_test = chisq.test(x = c(HxH, NxH+HxN+NxN), p = c(e_HxH, (e_NxH+e_HxN+e_NxN)))$p.value<0.00208,
         overall_chi_test = chisq.test(x = c(HxH, NxH,HxN,NxN), p = c(e_HxH, e_NxH,e_HxN,e_NxN))$p.value)
## Warning in chisq.test(x = c(NxH, HxH + HxN + NxN), p = c(e_NxH, (e_HxH + : Chi-
## squared approximation may be incorrect

## Warning in chisq.test(x = c(NxH, HxH + HxN + NxN), p = c(e_NxH, (e_HxH + : Chi-
## squared approximation may be incorrect
## Warning in chisq.test(x = c(NxN, HxH + HxN + NxH), p = c(e_NxN, (e_HxH + : Chi-
## squared approximation may be incorrect

## Warning in chisq.test(x = c(NxN, HxH + HxN + NxH), p = c(e_NxN, (e_HxH + : Chi-
## squared approximation may be incorrect

## Warning in chisq.test(x = c(NxN, HxH + HxN + NxH), p = c(e_NxN, (e_HxH + : Chi-
## squared approximation may be incorrect
## Warning in chisq.test(x = c(HxH, NxH, HxN, NxN), p = c(e_HxH, e_NxH, e_HxN, :
## Chi-squared approximation may be incorrect

## Warning in chisq.test(x = c(HxH, NxH, HxN, NxN), p = c(e_HxH, e_NxH, e_HxN, :
## Chi-squared approximation may be incorrect

## Warning in chisq.test(x = c(HxH, NxH, HxN, NxN), p = c(e_HxH, e_NxH, e_HxN, :
## Chi-squared approximation may be incorrect
#lets also add these all up

Is this the right way to think about it though? Right now we include the same mate pair multiple times if they produced multiple offspring. If we’re trying to get at sexual selection shouldn’t we only consider UNIQUE mate pairs?

Let’s try again.

#mate_pair %>%
#  group_by(cross) %>%
#  summarise(mean_father_tlf = mean(father_tlf), mean_mother_tlf = mean(mother_tlf))

# counts table
ncand <- parents %>%
  count(year, origin, sex) %>%
  pivot_wider(id_cols = year, values_from = n, names_from = c(origin, sex)) %>%
  select(-starts_with("NA"))

mate_pair %>%
  count( parent_year, cross) %>% pivot_wider(id_cols = parent_year, values_from = n, names_from = cross) %>%
  left_join(ncand, by = c("parent_year" = "year"))
# frequency table
fcand <- parents %>%
  filter(year < 2016, year > 2009) %>%
  group_by(year, origin, sex) %>%
  summarise(n = n()) %>%
  group_by(year) %>%
  mutate(freq = n / sum(n)) %>%
  select(-n) %>%
  pivot_wider(id_cols = year, values_from = freq, names_from = c(origin, sex)) %>%
  select(-starts_with("NA"))
## `summarise()` has grouped output by 'year', 'origin'. You can override using the
## `.groups` argument.
mate_pair_freqs <- mate_pair %>%
  distinct(pair, .keep_all = TRUE) %>%
  count( parent_year, cross) %>% pivot_wider(id_cols = parent_year, values_from = n, names_from = cross) %>%
  mutate(f_HxH = HxH/(HxH+HxN+NxH+NxN),
        f_HxN = HxN/(HxH+HxN+NxH+NxN),
        f_NxH = NxH/(HxH+HxN+NxH+NxN),
        f_NxN = NxN/(HxH+HxN+NxH+NxN)) %>%
  left_join(fcand, by = c("parent_year" = "year")) %>%
  mutate(e_HxH = HOR_F/(HOR_F+NOR_F) * HOR_M/(HOR_M+NOR_M),
         e_HxN = HOR_F/(HOR_F+NOR_F) * NOR_M/(HOR_M+NOR_M),
         e_NxH = NOR_F/(HOR_F+NOR_F) * HOR_M/(HOR_M+NOR_M),
         e_NxN = NOR_F/(HOR_F+NOR_F) * NOR_M/(HOR_M+NOR_M)) %>%
  select(parent_year, starts_with("f"), starts_with("e")) #clean up 


mate_pair_freqs_long <- mate_pair_freqs %>%
  pivot_longer(!parent_year, values_to = "freqs", names_to = c("obs_exp", "cross"), names_pattern = "([fe])_(.*)" ) %>%
  mutate(obs_exp = case_when(obs_exp == "f" ~ "observed",
                             obs_exp == "e" ~ "expected"))# %>%
#  mutate(cross = factor(cross, labels = c("HOR_female x HOR_male", "HOR_female x NOR_male", "NOR_female x HOR_male", "NOR_female x NOR_male")))

ggplot(data = mate_pair_freqs_long, aes(x = parent_year, y = freqs, color = obs_exp))+geom_point()+ facet_grid(~cross)+theme(axis.text.x = element_text(angle = 90))+xlab("Parent Year")+ylab("Frequency")

ggplot(data = mate_pair_freqs_long, aes(x = cross, y = freqs, color = obs_exp))+geom_point()+ theme(axis.text.x = element_text(angle = 90))+facet_grid(cols = vars(parent_year))

# let's do the chi square tests
## 2010
mate_pair %>%
  filter(parent_year == 2010) %>%
  count(cross_f)
mate_pair_freqs_long %>%
  filter(parent_year == 2010, obs_exp == "expected") 
# NxH
#(order reminder: NxN + HxH + HxN, vs NxH)
chisq.test(x = c(11+29+24,3), p = c(0.07343745+0.26540552+0.51786444,  0.14329259 )) #2010, p = 0.021
## 
##  Chi-squared test for given probabilities
## 
## data:  c(11 + 29 + 24, 3)
## X-squared = 5.2971, df = 1, p-value = 0.02136
chisq.test(x = c(19, 89 ,19, 56), p = c(0.262,  0.288,0.214,  0.236)) #2011,  p = 3.0e-12
## 
##  Chi-squared test for given probabilities
## 
## data:  c(19, 89, 19, 56)
## X-squared = 56.652, df = 3, p-value = 3.049e-12
chisq.test(x = c(7,  52 ,  20 , 41), p = c(0.219, 0.364, 0.157,  0.260)) #2012 p = 0.0002
## 
##  Chi-squared test for given probabilities
## 
## data:  c(7, 52, 20, 41)
## X-squared = 18.879, df = 3, p-value = 0.0002896

F1 vs F2 vs NORImmigrant Fitness

I still don’t think the approach is correct/ideal in the sections looking at F1 vs F2 fitness above. Let’s try again.

First some definitions:

F0: HOR candidate parents F1s: NOR descendants of HOR parents
F2s: Descendants of F1s
NOR immigrants: NOR individuals sampled 2013 or later with no assigned parents.

Let’s also define our data:
2015 and earlier: since we are interested in TLF must limit ourselves to 2015 and earlier. 2013 and later: all possible parents must have been sampled
Candidate parent: since we are interested in TLF, the individual must have been evaluated as a candidate parent.

Okay now let’s create the dataset.

#first, filter to match the rules above and create variable defining F1, F2, HOR outplant, and NOR immigrant

# note 1 since there hasn't been enough time for F3s by 2015, we can make coding easier and define any samples with two NOR parents as a F2)
# we will need a pedigree with origin attached
pedigree_origin <- pedigree %>%
  left_join(select(dedup, sample_id, origin), by = c("mother" = "sample_id")) %>%
  rename(mother_origin = origin) %>%
  left_join(select(dedup, sample_id, origin), by = c("father" = "sample_id")) %>%
  rename(father_origin = origin)

F12_data <- dedup %>%
  filter(year> 2011 , year < 2016, cand_parent == TRUE) %>%
  left_join(pedigree_origin, by = c("sample_id" = "offspring_sample_id")) %>%  # attach parents and their origin 
  mutate(generation = case_when(origin == "HOR" ~ "F0",
                                mother_origin == "HOR" & father_origin == "HOR" ~"F1",
                                mother_origin == "NOR" & father_origin == "NOR" ~"F2", #see note 1 above for why this makes sense
                                mother == "none" & father == "none" ~ "NORimmigrant",
                                mother == "none" | father == "none" ~ "single_parentage",
                                mother_origin == "HOR" & father_origin == "NOR" ~"HORxNOR",
                                mother_origin == "NOR" & father_origin == "HOR" ~"NORxHOR")) %>%
  left_join(select(parents, sample_id, tlf) )
## Joining, by = "sample_id"
F12_data %>%
  count(year, generation) %>%
  pivot_wider(id_cols = year, names_from = generation, values_from = n)

Sample sizes for F1s may be sufficient, but NOR immigrants and F2s are very low. LEt’s fit the model to look for an effect.

F12_mmdata <- F12_data %>%
  filter(generation %in% c("F0", "F1", "F2", "NORimmigrant")) %>%
  mutate(year_f = as.factor(year))

mm_f12 <- glm.nb(tlf ~ generation + year_f, data = F12_mmdata)
mm_f12_null <- glm.nb(tlf ~   year_f, data = F12_mmdata)
DHARMa::simulateResiduals(mm_f12, plot =TRUE)
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.5375951 0.7283119 0.7306923 0.8973635 0.2469138 0.05567503 0.4188951 0.4686232 0.0547014 0.8391782 0.7166583 0.228115 0.8132221 0.1399357 0.116907 0.1484173 0.5088604 0.1646787 0.370011 0.5108912 ...
DHARMa::testOutliers(mm_f12, type = "binomial", nBoot = 20000 )
## DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details

## 
##  DHARMa outlier test based on exact binomial test with approximate
##  expectations
## 
## data:  mm_f12
## outliers at both margin(s) = 32, observations = 2772, p-value = 0.04145
## alternative hypothesis: true probability of success is not equal to 0.007968127
## 95 percent confidence interval:
##  0.007909127 0.016257928
## sample estimates:
## frequency of outliers (expected: 0.00796812749003984 ) 
##                                             0.01154401
# marginal significance for outliers when including 2012, consider not including 2012

AIC(mm_f12_null, mm_f12)
drop1(mm_f12, test = "Chisq") 
summary(mm_f12)
## 
## Call:
## glm.nb(formula = tlf ~ generation + year_f, data = F12_mmdata, 
##     init.theta = 0.5595557925, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9457  -0.6844  -0.5358  -0.4964   3.1339  
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)             -1.6636     0.1016 -16.372  < 2e-16 ***
## generationF1             0.7677     0.1184   6.484 8.93e-11 ***
## generationF2             0.4858     0.4577   1.061 0.288483    
## generationNORimmigrant   0.8562     0.1493   5.735 9.78e-09 ***
## year_f2013               0.4287     0.1205   3.558 0.000374 ***
## year_f2014              -0.1464     0.1400  -1.046 0.295632    
## year_f2015              -0.3181     0.1506  -2.113 0.034631 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.5596) family taken to be 1)
## 
##     Null deviance: 1691.1  on 2771  degrees of freedom
## Residual deviance: 1569.6  on 2765  degrees of freedom
## AIC: 3415.6
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.5596 
##           Std. Err.:  0.0709 
## 
##  2 x log-likelihood:  -3399.6480

Model fit is good according to simulated residuals, both the generation (F0, F1, F2 and NORimmigrant) and year (fixed effect factor three years) significantly improve the fit to the data according to LRT and AIC.

Let’s look at the predicted effects

eff1 <- predictorEffect("generation", mm_f12)
effdf <- as.data.frame(eff1)


ggplot(data = effdf, aes(x = (generation), y = fit))+ 
  geom_point(position=position_dodge(width=0.3)) + 
  geom_errorbar(aes(ymin = lower, ymax = upper), position=position_dodge(width=0.3), width = 0.1)+ylab("TLF")+xlab("Generation")+theme_bw()

Let’s do some post hoc testing

# we'll use emmeans for this
em <- emmeans(mm_f12, "generation")
contrast(em, "pairwise", adjust = "Tukey")
##  contrast          estimate    SE  df z.ratio p.value
##  F0 - F1            -0.7677 0.118 Inf  -6.484  <.0001
##  F0 - F2            -0.4858 0.458 Inf  -1.061  0.7130
##  F0 - NORimmigrant  -0.8562 0.149 Inf  -5.735  <.0001
##  F1 - F2             0.2819 0.470 Inf   0.599  0.9322
##  F1 - NORimmigrant  -0.0885 0.159 Inf  -0.558  0.9443
##  F2 - NORimmigrant  -0.3704 0.478 Inf  -0.775  0.8660
## 
## Results are averaged over the levels of: year_f 
## Results are given on the log (not the response) scale. 
## P value adjustment: tukey method for comparing a family of 4 estimates

Yep, sample size is too low for F2s to say much new.

The only novel finding here is that NORimmigrants specifically have greater fitness than HORs. F2 sample size is too low to draw any conclusions. Also, given the length of our dataset, there will be a bias towards younger F2s descended from younger parents. We need to remember to take this into account.

F123_data <- dedup %>%
  filter(year> 2011 , cand_parent == TRUE) %>%
  left_join(pedigree_origin, by = c("sample_id" = "offspring_sample_id")) %>%  # attach parents and their origin 
  mutate(generation = case_when(origin == "HOR" ~ "F0",
                                mother_origin == "HOR" & father_origin == "HOR" ~"F1",
                                mother_origin == "NOR" & father_origin == "NOR" ~"F2", #see note 1 above for why this makes sense
                                mother == "none" & father == "none" ~ "NORimmigrant",
                                mother == "none" | father == "none" ~ "single_parentage",
                                mother_origin == "HOR" & father_origin == "NOR" ~"HORxNOR",
                                mother_origin == "NOR" & father_origin == "HOR" ~"NORxHOR")) %>%
  left_join(select(parents, sample_id, tlf) )
## Joining, by = "sample_id"
F123_data %>%
  count(year, generation) %>%
  pivot_wider(id_cols = year, names_from = generation, values_from = n)

Genetic Diversity

How does the level of genetic diversity vary between HORs, F1s, F2s and NOR immigrants?

We will use He as our metric.

# first let's define our groups
div_data <- dedup %>%
  filter(year> 2011 , cand_parent == TRUE) %>%
  left_join(pedigree_origin, by = c("sample_id" = "offspring_sample_id")) %>%  # attach parents and their origin 
  mutate(generation = case_when(origin == "HOR" ~ "F0",
                                mother_origin == "HOR" & father_origin == "HOR" ~"F1",
                                mother_origin == "NOR" & father_origin == "NOR" ~"F2", #see note 1 above for why this makes sense
                                mother == "none" & father == "none" ~ "NORimmigrant",
                                mother == "none" | father == "none" ~ "single_parentage",
                                mother_origin == "HOR" & father_origin == "NOR" ~"HORxNOR",
                                mother_origin == "NOR" & father_origin == "HOR" ~"NORxHOR")) %>%
  left_join(select(parents, sample_id, tlf) ) %>%
  select(generation, sample_id, starts_with("Ot")) %>%
  unite(Ot201, starts_with("Ot201"), sep = "_") %>%
  unite(Ot209, starts_with("Ot209"), sep = "_") %>%
  unite(Ot249, starts_with("Ot249"), sep = "_") %>%
  unite(Ot253, starts_with("Ot253"), sep = "_") %>%
  unite(Ot215, starts_with("Ot215"), sep = "_") %>%
  unite(Ot311, starts_with("Ot311"), sep = "_") %>%
  unite(Ot409, starts_with("Ot409"), sep = "_") %>%
  unite(Ot211, starts_with("Ot211"), sep = "_") %>%
  unite(Ot208, starts_with("Ot208"), sep = "_") %>%
  unite(Ot212, starts_with("Ot212"), sep = "_") %>%
  unite(Ot515, starts_with("Ot515"), sep = "_") %>%
  column_to_rownames(var = "sample_id") %>%
  filter(generation %in% c("F0", "F1", "NORimmigrant"))
## Joining, by = "sample_id"
require(adegenet)
## Loading required package: adegenet
## Warning: package 'adegenet' was built under R version 3.6.2
## Loading required package: ade4
## Warning: package 'ade4' was built under R version 3.6.2
## Registered S3 method overwritten by 'spdep':
##   method   from
##   plot.mst ape 
## 
##    /// adegenet 2.1.3 is loaded ////////////
## 
##    > overview: '?adegenet'
##    > tutorials/doc/questions: 'adegenetWeb()' 
##    > bug reports/feature requests: adegenetIssues()
div_genind <- df2genind(select(div_data, starts_with("Ot")), ploidy = 2, sep = "_")
div_genind@pop <-  as.factor(div_data$generation)

n.pop <- seppop(div_genind)

# diversity
hexp <- lapply(n.pop, function(x) (summary(x)$Hexp))
hexp <- as.data.frame(t(do.call(rbind, hexp)))
hexp <- hexp %>%
  rownames_to_column(var="marker") %>%
  pivot_longer(-marker, names_to = "generation", values_to = "He")

ggplot(data = hexp)+geom_boxplot(aes(x = generation , y = He))

hexp %>% 
  group_by(generation) %>%
  summarise(mean_he = mean(He))
#significance testing
#a <- Hs.test(div_genind[pop = "F1"], div_genind[pop = "F0"], n.sim = 1000, alter = "less")
#sig

#b <- Hs.test(div_genind[pop = "F0"], div_genind[pop = "NORimmigrant"], n.sim = 1000, alter = "greater")
# marginal si

#c <- Hs.test(div_genind[pop = "F1"], div_genind[pop = "NORimmigrant"], n.sim = 1000, alter = "less")
# marginal sig

Deprecated analyses

Analyses/code that either has minor issues (like insufficient power), is messy, deals with a very minor question or poorly conceived. The most important questions addressed in this section have been addressed in the Other Analyses section above

F1 Fitness

Deprecated, now see Mate Pairs

What is the relative fitness of offspring descended from HORxHOR, HORxNOR and NORxNOR crosses

I previously stated this result only has relevance for the manuscript. Essentially, I thought the main inference provided by this analysis was whether or not the effects of hatchery selection/rearing were reversible after a single generation in the “wild”.

Upon further reflection, I believe this is an important result to present to managers in this report.

Higher fitness of F1s descended from NORxNOR crosses than from NORxHOR indicates that supplementation with HOR outplants reduces the fitness of NOR reintros.

If this is the case, future management decisions must weigh the demographic boost provided by hatchery outplants against the deleterious genetic (or otherwise heritable) effects of the spawning between HOR and NOR salmon above the dam.

Clearly, with CRRs well below one, hatchery supplementation is required for the maintenance of the population, but as CRR is expected to improve as downstream passage facilities come online, managers may wish to know if hatchery supplementation has a deleterious effect on the fitness of reintroduced NORs.

This seems particularly important because pHOS above the dam is likely very high barring mcuh stronger pre-spawn mortality among HORs than NORs.

Prep
First we need to identify all individuals in the pedigree that are both candidate parents with TLF (i.e. <= 2015 sampling), and are assigned as offspring to both parents.

f1s <- pedigree_meta %>%
  filter( offspring_sample_id %in% parents$sample_id, assn_type == "pair") %>%
  left_join(select(dedup, sample_id, origin), by = c("father" = "sample_id")) %>%
  rename(father_origin = origin) %>%
  left_join(select(dedup, sample_id, origin), by = c("mother" = "sample_id")) %>%
  rename(mother_origin = origin) %>%
  left_join(select(parents, sample_id, tlf), by = c("offspring_sample_id" = "sample_id")) %>%
  mutate(cross = case_when(father_origin != mother_origin ~ "HxN",
                           father_origin == "NOR" & mother_origin == "NOR" ~ "NxN",
                           father_origin == "HOR" & mother_origin == "HOR" ~ "HxH"))

mm_data_for_join <-  mm_data %>%
  select(year, sex_ratio_y_l) %>%
  mutate(year = as.numeric(as.character(year))) %>%
  distinct(year, .keep_all = TRUE)
  

f1s %<>%
  left_join(mm_data_for_join, by = c("offspring_year" = "year")) %>%
  mutate(offspring_year = as.factor(offspring_year),
         jday_c = scale(yday(offspring_date), scale = F)) %>%
  mutate(group = as.factor(paste(offspring_date, offspring_type))) %>%
  select(offspring_sample_id, tlf, jday_c , offspring_sex , cross,  sex_ratio_y_l, group, offspring_year)

f1s %<>%
  drop_na()

#f1s_2014 <- f1s

EDA Let’s do a little EDA, specifically let’s look at balance among years.

kable(f1s %>%
        count(offspring_year, cross) %>%
        pivot_wider(id_cols = offspring_year, names_from = cross, values_from = n), caption = "Individuals with both known parents, by year and parent cross type") %>% kable_classic(full_width = F, html_font = "Arial" )
Individuals with both known parents, by year and parent cross type
offspring_year HxH HxN NxN
2010 3
2011 110
2012 275
2013 127 1
2014 48 22 8
2015 15 46 22

It’s quite clear that only data from two years, 2014 and 2015, is appropriate to use to address our question here. With such a small sample size, my original plan (fit glmm with all signficant predictors from the glmm on fitness for all candidate parents) is not going to work. We don’t have the degrees of freedom, nor is it appropriate to fit random effects with just 2 levels (year).

We will fit a relatively simple model to start including the effects of cross type, sex, year and the interaction of cross type and sex.

Given the sample size and the very high variance in TLF, I expect that we do not have enough power to identify differences. We’ll run the code anyway as I don’t think the type I error is inflated with these small sample sizes, but I don’t expect to include these results in the report or manuscript.

Assortative mating There’s an interesting result hiding just out of sight here. We can use the empirical vs expected proportion of HxN crossses (much like heterozygotes) to evalute if there is assortative mating between salmon of the same origin.

For 2014, the frequencies of HxH, HxN and NxN cross is (0.62, 0.28 and 0.10, respectively) which results in frequency of H = 0.756 and N = 0.244. The expected HxN cross frequency (under random mating) from these “genotype” frequencies is 0.368. So there are fewer “heterozygotes” than expected by chance, suggesting that there is assortative mating.

For 2015, the frequencies of HxH, HxN and NxN cross is (0.18, 0.554 and 0.265, respectively) which results in frequency H = .458 and frequency N = 0.542. The expected HxN cross frequency (under random mating) is 0.496. So there are slightly more “heterozygotes” than expected by chance, suggesting that there is assortative mating.

Note that in addition to “true” assortative mating driven by mate choice, there’s is also the possibility that salmon do not distribute far from their release site, or spawn soon after release and this signal of assortative mating is actually driven by NOR and HOR being more likely to be released together at the same time and place.

Still need to think about how best to estimate the expected number here, should it actually use the frequencies of candidate parents, not successful parents? Won’t publish this anyway, sample size is so small given the amount of variance in reproductive success.

Modeling Next we need a model to remove the other effects (most importantly year).

f1s %<>%
  filter(offspring_year %in% c(2014, 2015))

glmm_f1 <- glmmTMB(tlf ~  cross + offspring_year +offspring_sex +cross*offspring_sex, data = f1s, family = nbinom2)
drop1(glmm_f1, test = "Chisq")
#let's get rid of that interaction term (p = 0.43, yay more DF) and try again
glmm_f1 <- glmmTMB(tlf ~  cross + offspring_year +offspring_sex, data = f1s, family = nbinom2)
drop1(glmm_f1, test = "Chisq")
#let's get rid of the effect of year as well term (p = 0.18, yay more DF) and try again
glmm_f1 <- glmmTMB(tlf ~  cross + offspring_sex , data = f1s, family = nbinom2)
drop1(glmm_f1, test = "Chisq")
simulateResiduals(glmm_f1, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.8058881 0.3460519 0.3666661 0.996 0.6523142 0.9595412 0.7433763 0.9875738 0.6633605 0.4691904 0.438068 0.6045504 0.8818968 0.1943889 0.6653053 0.5327208 0.4164635 0.3025705 0.7081926 0.698483 ...
summary(glmm_f1)
##  Family: nbinom2  ( log )
## Formula:          tlf ~ cross + offspring_sex
## Data: f1s
## 
##      AIC      BIC   logLik deviance df.resid 
##    222.8    238.2   -106.4    212.8      156 
## 
## 
## Dispersion parameter for nbinom2 family (): 0.532 
## 
## Conditional model:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)     -1.3055     0.3578  -3.649 0.000263 ***
## crossHxN        -0.6280     0.4151  -1.513 0.130361    
## crossNxN        -0.5142     0.5314  -0.968 0.333234    
## offspring_sexM   0.5880     0.3959   1.485 0.137457    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(predictorEffect("cross", glmm_f1))
## Warning in Effect.glmmTMB(ans, mod, x.var = 1, xlevels = xlevels, ...):
## overriding variance function for effects: computed variances may be incorrect

As expected, no difference detected.

F1s by parent sex

Deprecated, now see Mate Pairs

What is the relative fitness of offspring descended from HORxHOR, HORxNOR and NORxNOR crosses, but this time sex of parent matters

Prep
First we need to identify all individuals in the pedigree that are both candidate parents with TLF (i.e. <= 2015 sampling), and are assigned as offspring to both parents.

f1s <- pedigree_meta %>%
  filter( offspring_sample_id %in% parents$sample_id, assn_type == "pair") %>%
  left_join(select(dedup, sample_id, origin), by = c("father" = "sample_id")) %>%
  rename(father_origin = origin) %>%
  left_join(select(dedup, sample_id, origin), by = c("mother" = "sample_id")) %>%
  rename(mother_origin = origin) %>%
  left_join(select(parents, sample_id, tlf), by = c("offspring_sample_id" = "sample_id")) %>%
  mutate(cross = case_when(father_origin == "HOR" & mother_origin == "NOR" ~ "HxN",
                           father_origin == "NOR" & mother_origin == "HOR" ~ "NxH",
                           father_origin == "NOR" & mother_origin == "NOR" ~ "NxN",
                           father_origin == "HOR" & mother_origin == "HOR" ~ "HxH"))

mm_data_for_join <-  mm_data %>%
  select(year, sex_ratio_y_l) %>%
  mutate(year = as.numeric(as.character(year))) %>%
  distinct(year, .keep_all = TRUE)
  

f1s %<>%
  left_join(mm_data_for_join, by = c("offspring_year" = "year")) %>%
  mutate(offspring_year = as.factor(offspring_year),
         jday_c = scale(yday(offspring_date), scale = F)) %>%
  mutate(group = as.factor(paste(offspring_date, offspring_type))) %>%
  select(offspring_sample_id, tlf, jday_c , offspring_sex , cross,  sex_ratio_y_l, group, offspring_year)

f1s %<>%
  drop_na()

#f1s_2014 <- f1s

EDA Let’s do a little EDA, specifically let’s look at balance among years.

kable(f1s %>%
        count(offspring_year, cross) %>%
        pivot_wider(id_cols = offspring_year, names_from = cross, values_from = n), caption = "F1 sample sizes, by year and parent cross type, origins of parents are labeled Male x Female") %>% kable_classic(full_width = F, html_font = "Arial" )
F1 sample sizes, by year and parent cross type, origins of parents are labeled Male x Female
offspring_year HxH HxN NxH NxN
2010 3
2011 110
2012 275
2013 127 1
2014 48 2 20 8
2015 15 9 37 22

It’s quite clear that only data from two years, 2014 and 2015, is appropriate to use to address our question here. With such a small sample size, my original plan (fit glmm with all signficant predictors from the glmm on fitness for all candidate parents) is not going to work. We don’t have the degrees of freedom, nor is it appropriate to fit random effects with just 2 levels (year).

We will fit a relatively simple model to start including the effects of cross type, sex, year and the interaction of cross type and sex

Assortative mating There’s an interesting result hiding just out of sight here. We can use the empirical vs expected proportion of HxN crossses (much like heterozygotes) to evalute if there is assortative mating between salmon of the same origin.

For 2014, the frequencies of HxH, HxN and NxN cross is (0.62, 0.28 and 0.10, respectively) which results in frequency of H = 0.756 and N = 0.244. The expected HxN cross frequency (under random mating) from these “genotype” frequencies is 0.368. So there are fewer “heterozygotes” than expected by chance, suggesting that there is assortative mating.

For 2015, the frequencies of HxH, HxN and NxN cross is (0.18, 0.554 and 0.265, respectively) which results in frequency H = .458 and frequency N = 0.542. The expected HxN cross frequency (under random mating) is 0.496. So there are slightly more “heterozygotes” than expected by chance, suggesting that there is assortative mating.

Note that in addition to “true” assortative mating driven by mate choice, there’s is also the possibility that salmon do not distribute far from their release site, or spawn soon after release and this signal of assortative mating is actually driven by NOR and HOR being more likely to be released together at the same time and place.

Still need to think about how best to estimate the expected number here, should it actually use the frequencies of candidate parents, not successful parents? Won’t publish this anyway, sample size is so small given the amount of variance in reproductive success.

Modeling Next we need a model to remove the other effects (most importantly year).

f1s %<>%
  filter(offspring_year %in% c(2014, 2015))

glmm_f1 <- glm.nb(tlf ~  cross + offspring_year +offspring_sex, data = f1s)

simulateResiduals(glmm_f1, plot = TRUE)

## Object of Class DHARMa with simulated residuals based on 250 simulations with refit = FALSE . See ?DHARMa::simulateResiduals for help. 
##  
## Scaled residual values: 0.8986826 0.2340413 0.4131193 1 0.7101315 0.98197 0.6737199 0.9875601 0.8176709 0.5428316 0.214586 0.637842 0.9323766 0.6237055 0.1910791 0.2950659 0.4914603 0.5064995 0.7853128 0.613115 ...
rootogram(glmm_f1)

drop1(glmm_f1, test = "Chisq")
summary(glmm_f1)
## 
## Call:
## glm.nb(formula = tlf ~ cross + offspring_year + offspring_sex, 
##     data = f1s, init.theta = 0.6371064023, link = log)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.9836  -0.7263  -0.6054  -0.4498   2.6819  
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -1.45062    0.37619  -3.856 0.000115 ***
## crossHxN            0.03604    0.66682   0.054 0.956893    
## crossNxH           -1.19094    0.49801  -2.391 0.016783 *  
## crossNxN           -0.75981    0.55307  -1.374 0.169498    
## offspring_year2015  0.48402    0.41989   1.153 0.249019    
## offspring_sexM      0.60790    0.38964   1.560 0.118720    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.6371) family taken to be 1)
## 
##     Null deviance: 105.121  on 160  degrees of freedom
## Residual deviance:  95.498  on 155  degrees of freedom
## AIC: 221.93
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.637 
##           Std. Err.:  0.330 
## 
##  2 x log-likelihood:  -207.927
plot(predictorEffect("cross", glmm_f1))

glmm_f1_2015 <- glmmTMB(tlf ~  cross , data = f1s_2015, family = nbinom2)

simulateResiduals(glmm_f1_2015, plot = TRUE)
drop1(glmm_f1_2015, test = "Chisq")

summary(glmm_f1_2015)

plot(predictorEffect("cross", glmm_f1_2015))

Great-Grandparentage

Deprecated, now see Mate Pairs

This section is still mostly a dumping ground for code, needs to be documented and organized into something easily read by others.

Comparing F2 to F1 Fitness note: take code chunk below and revise it to use the pedigree_meta, to keep track of parent and grandparent origin, also note that starting from pedigree means only offsping are included. probably best o use the code below to id f1s and f2s, the append these labels to the dedup dataset, along with fitness, after filtering to remove individuals that are not candidate parents

grandparents <- pedigree %>%
  mutate(offspring_also_parent = case_when(offspring_sample_id %in% c(mother, father) ~ TRUE,
                                           TRUE ~ FALSE)) %>%
  left_join(select(dedup, sample_id, year), by = c("offspring_sample_id" = "sample_id"))


# F1s are individuals descended from known parents 
F1_list <- grandparents %>%
  filter(mother != "none" | father != "none") %>%
  pull(offspring_sample_id)

#F2s are individuals descended from F1s
F2_list <- grandparents %>%
  filter(mother %in% F1_list | father %in% F1_list) %>%
  pull(offspring_sample_id)
  
grandparents %<>%
   mutate(F1 = offspring_sample_id %in% F1_list,
          F2 = offspring_sample_id %in% F2_list) 

F2grandparent_list_dam <- grandparents %>% filter(F1 == TRUE, F2 == TRUE, offspring_also_parent == TRUE) %>% pull(mother)
F2grandparent_list_sire <- grandparents %>% filter(F1 == TRUE, F2 == TRUE, offspring_also_parent == TRUE) %>% pull(father)

grandparents %>% 
  filter(offspring_sample_id %in% F2grandparent_list_dam | offspring_sample_id %in% F2grandparent_list_sire )
kable(count(grandparents, F1, F2, offspring_also_parent) %>%
  rename(known_parents = F1, known_grandparents = F2, known_offspring = offspring_also_parent)) %>% kable_classic(full_width = F, html_font = "Cambria" ) 
known_parents known_grandparents known_offspring n
FALSE FALSE FALSE 1049
FALSE FALSE TRUE 197
TRUE FALSE FALSE 1036
TRUE FALSE TRUE 242
TRUE TRUE FALSE 365
TRUE TRUE TRUE 26
# here we present the F2s broken down by return year. they need to return in 2015 or earlier for us to have a TLF
kable(grandparents %>% filter(F2 == TRUE) %>% count(year), caption = "F2 Sample Sizes, by return year") %>% kable_classic(full_width = F, html_font = "Cambria" ) 
F2 Sample Sizes, by return year
year n
2014 3
2015 43
2016 79
2017 95
2018 54
2019 37
2020 80
# at some point we also need to subdivide these into those that descend from HORxHOR, HORxNOR and NORxNOR crosses. Ideally we would compare F0 fitness to F1 fitness to F2 fitness, but the F1s and F2s would all need to be NOR

f2s <- grandparents %>%
  left_join(parent_counts, by = c("offspring_sample_id" = "parent") ) %>%
  rename(fitness = n) %>%
  mutate(fitness = as.numeric(fitness),
         fitness = case_when(is.na(fitness) ~ 0,
                             TRUE ~ fitness)) %>%
  filter(F2 == TRUE, offspring_sample_id %in% parents$sample_id,  year < 2017)




f1s <- grandparents %>%
  left_join(parent_counts, by = c("offspring_sample_id" = "parent") ) %>%
  rename(fitness = n) %>%
  mutate(fitness = as.numeric(fitness),
         fitness = case_when(is.na(fitness) ~ 0,
                             TRUE ~ fitness)) %>%
  filter(F1 == TRUE, offspring_sample_id %in% parents$sample_id, year < 2017) 

a <- f2s %>% 
  group_by(year) %>%
  summarise(mean_fitness_f2 = mean(fitness), n_f2 = n())
  
b <- f1s %>% 
  group_by(year) %>%
  summarise(mean_fitness_f1 = mean(fitness), n_f1 = n())

f0s <- grandparents %>%
  left_join(parent_counts, by = c("offspring_sample_id" = "parent") ) %>%
  rename(fitness = n) %>%
  mutate(fitness = as.numeric(fitness),
         fitness = case_when(is.na(fitness) ~ 0,
                             TRUE ~ fitness)) %>%
  filter(F1 == FALSE, offspring_sample_id %in% parents$sample_id, year < 2017) # pretty sure this isnt working the right way

a <- f2s %>% 
  group_by(year) %>%
  summarise(mean_fitness_f2 = mean(fitness), n_f2 = n())
  
b <- f1s %>% 
  group_by(year) %>%
  summarise(mean_fitness_f1 = mean(fitness), n_f1 = n())

c <- f0s %>% 
  group_by(year) %>%
  summarise(mean_fitness_f0 = mean(fitness), n_f0 = n())

left_join(b, a) %>%
  left_join(c)
## Joining, by = "year"Joining, by = "year"

There are 3 years where we can compare TLF of F1s and F2s in our dataset, 2014, 2015 and 2016.

In 2014, there are only 3 F2s. In 2015, the mean fitness of an F1 (ignoring parent origin, n = 123) was 0.31. The F2s (n = 28) had lower fitness, 0.28. In 2016, we only have years 3 and 4 offspring. F2s were way more fit than F1s, but this could have more to do with a change in AAM between F2s and F1s than any difference in TLF. We need 2021 retunrs to compare.

Conclusion
Not enough F2s with TLF to make a comparison.

Release location

We did not find a significant effect of release location. This can be explained one of two ways, either salmon stay near their release locations and there is little effect of spawning location on fitness, or salmon have time to distribute to preferred habitats after release. We can evaluate this using the pedigree.

Is there evidence that parents are more likely to spawn with other parents at the same release location than by chance alone?

For now let’s look at the rate that parents spwan with a parent from a different relesae location, we can add a statistical analysis later.

pedigree %>%
  left_join(select(dedup, sample_id, release), by = c("mother" = "sample_id")) %>%
  rename(mother_release = release) %>%
  left_join(select(dedup, sample_id, release), by = c("father" = "sample_id")) %>%
  rename(father_release = release) %>%
  filter(mother != "none", father != "none") %>%
  summarise(same_release = sum(mother_release == father_release, na.rm = TRUE), different_release = sum(mother_release != father_release, na.rm = TRUE))

For the 1153 parentages in the pedigree where we assign both parents and know the release location of both parents, 571 are between parents released in the same location and 582 are not. This is similar to previous results (47.2% of mate pairs released in different locations). Given that there are more than two release locations in most years the finding that ~half of mate pairs are from the same location suggests there is some tendency to spawn near the release site, but further analysis will have to wait.

Diversity by Source

Deprecated, now see Genetic Diversity

This is deprecated, use the “genetic diversity” section instead

Let’s estimate the genetic diversity and the number of private alleles among NORs produced above the dam, NOR immigrants at Cougar Trap and HOR outplants

#define groups

nor_immigrant_list <- pedigree_meta %>%
  filter(assn_type == "none") %>%
  pull(offspring_sample_id)

div_data <- dedup %>%
  filter(year > 2012) %>% # only look at years where we can assign an offspring to all of their cand parents)
  mutate(div_group = case_when(origin == "HOR" ~ "hor",
                               sample_id %in% nor_immigrant_list ~ "immigrant", 
                               TRUE ~ "F1")) %>%
  select(div_group, sample_id, starts_with("Ot")) %>%
  unite(Ot201, starts_with("Ot201"), sep = "_") %>%
  unite(Ot209, starts_with("Ot209"), sep = "_") %>%
  unite(Ot249, starts_with("Ot249"), sep = "_") %>%
  unite(Ot253, starts_with("Ot253"), sep = "_") %>%
  unite(Ot215, starts_with("Ot215"), sep = "_") %>%
  unite(Ot311, starts_with("Ot311"), sep = "_") %>%
  unite(Ot409, starts_with("Ot409"), sep = "_") %>%
  unite(Ot211, starts_with("Ot211"), sep = "_") %>%
  unite(Ot208, starts_with("Ot208"), sep = "_") %>%
  unite(Ot212, starts_with("Ot212"), sep = "_") %>%
  unite(Ot515, starts_with("Ot515"), sep = "_") %>%
  column_to_rownames(var = "sample_id")
  

require(adegenet)

div_genind <- df2genind(select(div_data, starts_with("Ot")), ploidy = 2, sep = "_")
div_genind@pop <-  as.factor(div_data$div_group)

n.pop <- seppop(div_genind)

# diversity
hexp <- lapply(n.pop, function(x) (summary(x)$Hexp))
hexp <- as.data.frame(t(do.call(rbind, hexp)))
hexp <- hexp %>%
  rownames_to_column(var="marker") %>%
  pivot_longer(-marker, names_to = "div_group", values_to = "He")

ggplot(data = hexp)+geom_boxplot(aes(x = div_group , y = He))

#significance testing
a <- Hs.test(div_genind[pop = "F1"], div_genind[pop = "hor"], n.sim = 100, alter = "less")
b <- Hs.test(div_genind[pop = "hor"], div_genind[pop = "immigrant"], n.sim = 100, alter = "greater")
c <- Hs.test(div_genind[pop = "F1"], div_genind[pop = "immigrant"], n.sim = 100, alter = "less")

# private alleles
#require(poppr)
#priv <- private_alleles(div_genind, count.alleles = FALSE)
#as.data.frame(priv) %>%
#  rownames_to_column(var = "group") %>%
#  mutate(sum_private = rowSums(across(where(is.numeric)))) %>%
#  select(group, sum_private)

# private_alleles, subsample to 500 individuals, eventually should make this into a bootstrapping thing, 
#sample_div <- dedup %>%
#  filter(year > 2012) %>% # only look at years where we can assign an offspring to all of their cand parents)
#  mutate(div_group = case_when(origin == "HOR" ~ "hor",
#                               sample_id %in% nor_immigrant_list ~ "immigrant", 
#                               TRUE ~ "F1")) %>%
#  group_by(div_group) %>%
#  slice_sample(n = 500, replace = F) %>%
#  pull(sample_id)

#priv_sample <- private_alleles(div_genind[indNames(div_genind) %in% sample_div], count.alleles = FALSE)
#as.data.frame(priv_sample) %>%
#  rownames_to_column(var = "group") %>%
#  mutate(sum_private = rowSums(across(where(is.numeric)))) %>%
#  select(group, sum_private)

Summary
Offspring assigned to parents above the dam have lower genetic diversity than either NOR immigrants or hatchery outplants, but, importantly, NOR immigrants do not differ from HOR outplants in average genetic diversity.

Interestingly there are private alleles in each of the following groups, hatchery outplants, offspring assigned to a parent above the dam and NOR immigrants (17, 11, and 14), but this is not a great comparison because the sample size differs by group. When subsampling to 500 individuals from each group, samples of NOR immigrants contain many more “private alleles.” This suggests that there are some rare alleles in HOR outplants that occur at higher frequency in NOR immigrants.

NSNT sex ratio

The effect of sex ratio is totally different here than in NSNT report, despite similar range of sex ratios above the dams across years. Let’s summarise the sex ratio, fitness and pHOS data into a single place to help us think about this

z <- parents %>% filter(year <2016) %>% group_by(year, origin) %>% summarise(n = n()) %>% pivot_wider(id_cols = year, names_from = "origin", values_from = "n") %>% mutate(perc_nor = NOR/(HOR+NOR))
## `summarise()` has grouped output by 'year'. You can override using the `.groups`
## argument.
m <- parents %>% filter(sex == "M") %>% count(as.numeric(year)) %>% rename(n_male = n, year = "as.numeric(year)")
f <- parents %>% filter(sex == "F") %>% count(as.numeric(year)) %>% rename(n_female = n, year = "as.numeric(year)")

fit <- parents %>% group_by(year) %>% summarise(tlf = mean(tlf))

z2 <- z %>% left_join(m) %>%
  left_join(f) %>%
  mutate(sex_ratio = n_male/n_female) %>%
  left_join(fit) %>%
  replace_na(list(perc_nor = 0)) 
## Joining, by = "year"Joining, by = "year"Joining, by = "year"
z2

Let’s also examine if there is an pHOS by sex ratio interaction.

z3 <- z2 %>%
  mutate(year = as.factor(year))

mm_data_z <- mm_data %>%
  left_join(select(z3, year, perc_nor))

mm_z <- glmmTMB(tlf ~ jday_c + sex +  sex_ratio_y_l  + origin + sex*origin + sex*sex_ratio_y_l + origin*perc_nor + (1|group) + (1|year), data = mm_data_z, family = nbinom2)

mm_z2 <- glmmTMB(tlf ~ jday_c + sex +  sex_ratio_y_l  + origin + sex*origin + sex*sex_ratio_y_l + (1|group) + (1|year), data = filter(mm_data, year %in% c( "2010", "2012", "2015", "2011", "2013", "2014")), family = nbinom2)

mm_z3 <- glmmTMB(tlf ~ jday_c + sex +  sex_ratio_y_l   + sex*sex_ratio_y_l + (1|group) + (1|year), data = filter(mm_data, year %in% c( "2007", "2008", "2009")), family = nbinom2)

plot(predictorEffect("sex_ratio_y_l",mm_z2), lines=list(multiline=TRUE), confint=list(style="bars"))

plot(predictorEffect("sex_ratio_y_l",mm_z3), lines=list(multiline=TRUE), confint=list(style="bars"))


mm_z <- glmmTMB(tlf ~ jday_c + sex +  sex_ratio_y_l  + origin + sex*origin + sex*sex_ratio_y_l + annual_n*sex_ratio_y_l + (1|group) + (1|year), data = mm_data_z, family = nbinom2)

succesful spawners counts

Nothing wrong with this code, in deprecated just to keep track of code.

kable(parents %>% filter(tlf > 0) %>% count(year, origin) %>% pivot_wider(id_cols = year, names_from = "origin", values_from = "n"), caption = "successful spawners (tlf >0), by origin and year") %>%
  kableExtra::kable_classic(full_width = FALSE)
successful spawners (tlf >0), by origin and year
year HOR NOR
2007 261
2008 229
2009 156
2010 65 40
2011 91 118
2012 77 129
2013 79 73
2014 68 22
2015 71 32
2016 119 51

Genotypic vs Phenotypic Sex

Nothing wrong with this code, in deprecated section because not major question.

How well do phenotypic and genotypic sex agree?

Some notes:
(1) We have phenotypic and genotypic sex for all hatchery outplants except 2017.
(2) Phenotypic males are divided into males and jacks. I consider both males for the calculations below.

# overall agreement
dedup %>% 
  filter(type == "hatchery_outplant") %>%
  mutate(pheno_sex_2 = case_when(pheno_sex == "J" ~ "M",
                                 TRUE ~ pheno_sex)) %>%
  summarise(geno_pheno_agree = mean(geno_sex == pheno_sex_2, na.rm=TRUE))
#agreement per year
dedup %>% 
  filter(type == "hatchery_outplant") %>%
  group_by(year) %>%
  mutate(pheno_sex_2 = case_when(pheno_sex == "J" ~ "M",
                                 TRUE ~ pheno_sex)) %>%
  summarise(geno_pheno_agree = mean(geno_sex == pheno_sex_2, na.rm=TRUE))

Also, the goal for hatchery outplants is a roughly 2:1 female to male sex ratio. How well is this accomplished?

geno_sex_table <- dedup %>% 
  filter(type == "hatchery_outplant") %>% 
  count(year, geno_sex) %>%
  pivot_wider(id_cols = year, names_from = "geno_sex", values_from = n) %>%
  rename(geno_F = F, geno_M = M) %>%
  mutate(geno_sex_ratio = geno_F/geno_M)


pheno_sex_table <- dedup %>% 
  filter(type == "hatchery_outplant") %>% 
  mutate(pheno_sex_2 = case_when(pheno_sex == "J" ~ "M",
                                 TRUE ~ pheno_sex)) %>%
  count(year, pheno_sex_2) %>%
  pivot_wider(id_cols = year, names_from = "pheno_sex_2", values_from = n) %>%
  rename(pheno_F = F, pheno_M = M) %>%
  mutate(pheno_sex_ratio = pheno_F/(pheno_M))

left_join(pheno_sex_table, geno_sex_table, by = c("year" = "year")) %>%
  relocate(year, pheno_sex_ratio, geno_sex_ratio)
dedup %>% 
  filter(type == "hatchery_outplant") %>% 
  mutate(pheno_sex_2 = case_when(pheno_sex == "J" ~ "M",
                                 TRUE ~ pheno_sex)) %>%
  count(year, pheno_sex) %>%
  pivot_wider(id_cols = year, names_from = "pheno_sex", values_from = n)

How well do they correspond at the individual level?

dedup %>% 
  filter(type == "hatchery_outplant") %>%
  group_by(year) %>%
  summarise(geno_pheno_agree = mean(geno_sex == pheno_sex, na.rm=TRUE))

What about jacks?

dedup %>% 
  filter(type == "hatchery_outplant", pheno_sex == "J") %>%
  group_by(year) %>%
  summarise(geno_pheno_agree = mean(geno_sex == "M", na.rm=TRUE))
dedup %>% 
  filter(type == "hatchery_outplant", pheno_sex == "J") %>%
  summarise(geno_pheno_agree = mean(geno_sex == "M", na.rm=TRUE))

This last result (98% of jacks are genotypically male), because it is extremely unlikely to misidentify a jack for a juvenile female in Chinook salmon. Therefore we can assume all fish identified as jacks are true genetic males and the mismatch provides an estimate of genotyping error.